On Subspace Methods in System Identification and Sensor ... - KTH
On Subspace Methods in System Identification and Sensor ... - KTH
On Subspace Methods in System Identification and Sensor ... - KTH
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>On</strong> <strong>Subspace</strong> <strong>Methods</strong> <strong>in</strong> <strong>System</strong><br />
<strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Signal<br />
Process<strong>in</strong>g<br />
Magnus Jansson<br />
TRITA–S3–REG–9701<br />
ISSN 0347–1071<br />
Automatic Control<br />
Department of Signals, <strong>Sensor</strong>s <strong>and</strong> <strong>System</strong>s<br />
Royal Institute of Technology (<strong>KTH</strong>)<br />
Stockholm, Sweden<br />
Submitted to the School of Electrical Eng<strong>in</strong>eer<strong>in</strong>g, Royal Institute of<br />
Technology, <strong>in</strong> partial fulfillment of the requirements for the degree of<br />
Doctor of Philosophy.
Copyright c○1997 by Magnus Jansson<br />
Pr<strong>in</strong>ted by <strong>KTH</strong> Tryck & Kopier<strong>in</strong>g 1997
Abstract<br />
The common theme of the thesis is the derivation <strong>and</strong> analysis of subspace<br />
parameter estimation methods. In particular, this thesis is concerned<br />
with the identification of l<strong>in</strong>ear time <strong>in</strong>variant dynamical systems <strong>and</strong><br />
the estimation of the directions of arrival (DOAs) us<strong>in</strong>g sensor arrays.<br />
The first part focuses on sensor array signal process<strong>in</strong>g. The forwardbackward<br />
approach has proved quite useful for improv<strong>in</strong>g the performance<br />
of many suboptimal array process<strong>in</strong>g methods. Here<strong>in</strong>, it is shown<br />
that this approach should not be used with a statistically optimal method<br />
such as MODE or WSF.<br />
A robust weighted subspace fitt<strong>in</strong>g method for a wide class of array<br />
perturbation models is derived. The method takes the first order errors<br />
due to f<strong>in</strong>ite sample effects of the noise <strong>and</strong> the model perturbation <strong>in</strong>to<br />
account <strong>in</strong> an optimal way to give m<strong>in</strong>imum variance estimates of the<br />
DOAs. Interest<strong>in</strong>gly enough, the method reduces to the WSF (MODE)<br />
estimator if no model errors are present. <strong>On</strong> the other h<strong>and</strong>, when model<br />
errors dom<strong>in</strong>ate, the proposed method turns out to be equivalent to the<br />
“model-errors-only subspace fitt<strong>in</strong>g method”. Furthermore, an alternative<br />
method known as MAPprox, is also shown to yield m<strong>in</strong>imum variance<br />
estimates.<br />
A novel subspace-based algorithm for the estimation of the DOAs of<br />
uncorrelated emitter signals is derived. The asymptotic variance of the<br />
estimates is shown to co<strong>in</strong>cide with the relevant Cramér Rao lower bound.<br />
This implies that the method <strong>in</strong> certa<strong>in</strong> cases has a significantly better<br />
performance than methods that do not exploit the a priori <strong>in</strong>formation<br />
about the signal correlation. Unlike previous techniques, the estimator<br />
can be implemented by mak<strong>in</strong>g use of st<strong>and</strong>ard matrix operations only,<br />
if the array is uniform <strong>and</strong> l<strong>in</strong>ear.<br />
The second part of the thesis deals with the analysis <strong>and</strong> <strong>in</strong>terpretation<br />
of subspace methods for system identification. It is shown that sev-
iv<br />
eral exist<strong>in</strong>g subspace estimates result from a l<strong>in</strong>ear regression multistepahead<br />
prediction error m<strong>in</strong>imization under certa<strong>in</strong> rank constra<strong>in</strong>ts. Furthermore,<br />
it is found that the pole estimates obta<strong>in</strong>ed by two subspace<br />
system identification methods asymptotically are <strong>in</strong>variant to a rowweight<strong>in</strong>g<br />
<strong>in</strong> the cost functions. The consistency of a large class of subspace<br />
estimates is analyzed. Persistence of excitation conditions on the<br />
<strong>in</strong>put signal are given which guarantee consistency for systems with only<br />
measurement noise. For systems with process noise, an example where<br />
subspace methods fail to yield a consistent estimate of the transfer function<br />
is given. This failure occurs even though the <strong>in</strong>put is persistently<br />
excit<strong>in</strong>g. Such problems do not arise if the <strong>in</strong>put is a white noise process<br />
or an ARMA process of sufficiently low order.
Acknowledgments<br />
The years at the department of Signals, <strong>Sensor</strong>s & <strong>System</strong>s have been<br />
most enjoyable <strong>and</strong> stimulat<strong>in</strong>g. I would like to express my gratitude<br />
to my supervisor Professor Bo Wahlberg who conv<strong>in</strong>ced me to proceed<br />
the master studies towards the Ph.D. degree. <strong>On</strong>ce started, I have never<br />
regretted that choice. Thank you Bo for always be<strong>in</strong>g positive <strong>and</strong> very<br />
encourag<strong>in</strong>g.<br />
It has been a pleasure to go to the department (almost) every morn<strong>in</strong>g<br />
<strong>in</strong> the last five years. Thanks to all colleagues at the department, former<br />
<strong>and</strong> present, for be<strong>in</strong>g who you are. I have enjoyed our discussions at<br />
lunches <strong>and</strong> coffee breaks, play<strong>in</strong>g “<strong>in</strong>neb<strong>and</strong>y”, <strong>and</strong> ...<br />
To all friends I apologize for often be<strong>in</strong>g absent m<strong>in</strong>ded dur<strong>in</strong>g these<br />
years. I appreciate your support.<br />
Many thanks to my collaborators <strong>and</strong> co-authors, the Professors Björn<br />
Ottersten, Petre Stoica, Lee Sw<strong>in</strong>dlehurst, Bo Wahlberg, <strong>and</strong> Tekn. Lic.<br />
Bo Göransson. The <strong>in</strong>teraction has been very stimulat<strong>in</strong>g for me <strong>and</strong><br />
reward<strong>in</strong>g for my work.<br />
The f<strong>in</strong>ancial support that made this thesis project possible is gratefully<br />
acknowledged.<br />
F<strong>in</strong>ally, I am very grateful to my family for their cont<strong>in</strong>uous support<br />
<strong>and</strong> encouragement. I dedicate this thesis to my family.
Contents<br />
1 Introduction 1<br />
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g . . . . . . . . . . . . . . . 1<br />
1.1.1 Data Model . . . . . . . . . . . . . . . . . . . . . . 2<br />
1.1.2 Estimation <strong>Methods</strong> . . . . . . . . . . . . . . . . . 6<br />
1.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 14<br />
1.2 <strong>System</strong> <strong>Identification</strong> . . . . . . . . . . . . . . . . . . . . . 17<br />
1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . 17<br />
1.2.2 <strong>Subspace</strong> <strong>Methods</strong> . . . . . . . . . . . . . . . . . . 19<br />
1.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 24<br />
1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions . . . . . . . . . . . . . . . . . . 25<br />
1.4 Topics for Future Research . . . . . . . . . . . . . . . . . 29<br />
I <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 33<br />
2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
35<br />
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />
2.2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />
2.3 Prelim<strong>in</strong>ary Results . . . . . . . . . . . . . . . . . . . . . 38<br />
2.3.1 Results on A <strong>and</strong> B . . . . . . . . . . . . . . . . . 41<br />
2.3.2 Results on R . . . . . . . . . . . . . . . . . . . . . 42<br />
2.3.3 Results on P <strong>and</strong> R . . . . . . . . . . . . . . . . . 43<br />
2.3.4 Results on R0 <strong>and</strong> R . . . . . . . . . . . . . . . . 44<br />
2.3.5 Results on ABC Estimation . . . . . . . . . . . . . 46<br />
2.4 Brief Review of F-MODE . . . . . . . . . . . . . . . . . . 47<br />
2.5 Derivation of FB-MODE . . . . . . . . . . . . . . . . . . . 48<br />
2.6 Analysis of FB-MODE . . . . . . . . . . . . . . . . . . . . 52
viii Contents<br />
2.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 56<br />
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />
3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
61<br />
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . 64<br />
3.3 Robust Estimation . . . . . . . . . . . . . . . . . . . . . . 66<br />
3.3.1 MAP . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />
3.3.2 MAPprox . . . . . . . . . . . . . . . . . . . . . . . 68<br />
3.3.3 MAP-NSF . . . . . . . . . . . . . . . . . . . . . . . 68<br />
3.3.4 Cramér Rao Lower Bound . . . . . . . . . . . . . . 70<br />
3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) . . . . . 71<br />
3.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . 74<br />
3.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 74<br />
3.5.2 Asymptotic Distribution . . . . . . . . . . . . . . . 74<br />
3.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 76<br />
3.6.1 Implementation for General Arrays . . . . . . . . . 76<br />
3.6.2 Implementation for Uniform L<strong>in</strong>ear Arrays . . . . 79<br />
3.7 Performance Analysis of MAPprox . . . . . . . . . . . . . 83<br />
3.8 Simulation Examples . . . . . . . . . . . . . . . . . . . . . 87<br />
3.8.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . 88<br />
3.8.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . 90<br />
3.8.3 Example 3 . . . . . . . . . . . . . . . . . . . . . . . 91<br />
3.8.4 Example 4 . . . . . . . . . . . . . . . . . . . . . . . 91<br />
3.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />
4 A Direction Estimator for Uncorrelated Emitters 95<br />
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
4.2 Data Model <strong>and</strong> Assumptions . . . . . . . . . . . . . . . . 97<br />
4.3 Estimat<strong>in</strong>g the Parameters . . . . . . . . . . . . . . . . . 97<br />
4.4 The CRB for Uncorrelated Sources . . . . . . . . . . . . . 100<br />
4.5 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . 101<br />
4.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 101<br />
4.5.2 Asymptotic Distribution . . . . . . . . . . . . . . . 102<br />
4.5.3 Optimal Weight<strong>in</strong>g . . . . . . . . . . . . . . . . . . 105<br />
4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 106<br />
4.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 109<br />
4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Contents ix<br />
II <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> 115<br />
5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
117<br />
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />
5.2 Model <strong>and</strong> Assumptions . . . . . . . . . . . . . . . . . . . 119<br />
5.2.1 4SID Basic Equation . . . . . . . . . . . . . . . . . 120<br />
5.2.2 General Assumptions . . . . . . . . . . . . . . . . . 121<br />
5.3 L<strong>in</strong>ear Regression . . . . . . . . . . . . . . . . . . . . . . . 122<br />
5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 122<br />
5.3.2 4SID L<strong>in</strong>ear Regression Model . . . . . . . . . . . 124<br />
5.4 <strong>Subspace</strong> Estimation . . . . . . . . . . . . . . . . . . . . . 125<br />
5.4.1 Weighted Least Squares (WLS) . . . . . . . . . . . 128<br />
5.4.2 Approximate Maximum Likelihood (AML) . . . . 129<br />
5.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 130<br />
5.5 Pole Estimation . . . . . . . . . . . . . . . . . . . . . . . . 132<br />
5.5.1 The Shift Invariance Method . . . . . . . . . . . . 132<br />
5.5.2 <strong>Subspace</strong> Fitt<strong>in</strong>g . . . . . . . . . . . . . . . . . . . 133<br />
5.6 Performance Analysis . . . . . . . . . . . . . . . . . . . . 134<br />
5.6.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 134<br />
5.6.2 Analysis of the Shift Invariance Method . . . . . . 136<br />
5.6.3 Analysis of the <strong>Subspace</strong> Fitt<strong>in</strong>g Method . . . . . 138<br />
5.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<br />
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 147<br />
6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
151<br />
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />
6.2 Prelim<strong>in</strong>aries . . . . . . . . . . . . . . . . . . . . . . . . . 153<br />
6.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 155<br />
6.3.1 Basic 4SID . . . . . . . . . . . . . . . . . . . . . . 156<br />
6.3.2 IV-4SID . . . . . . . . . . . . . . . . . . . . . . . . 157<br />
6.4 Analysis of Basic 4SID . . . . . . . . . . . . . . . . . . . . 158<br />
6.5 Analysis of IV-4SID . . . . . . . . . . . . . . . . . . . . . 162<br />
6.6 Counterexample . . . . . . . . . . . . . . . . . . . . . . . 169<br />
6.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 173<br />
6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 175
x Contents<br />
A Proofs <strong>and</strong> Derivations 177<br />
A.1 Proofs of (2.34) <strong>and</strong> (2.35) . . . . . . . . . . . . . . . . . . 177<br />
A.2 Asymptotic Covariance of the Residual . . . . . . . . . . . 179<br />
A.3 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . 183<br />
A.4 Some Properties of the Block Transpose . . . . . . . . . . 185<br />
A.5 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . 186<br />
A.6 Rank of B(θ) . . . . . . . . . . . . . . . . . . . . . . . . . 190<br />
A.7 Derivation of the Asymptotic Distribution . . . . . . . . . 191<br />
A.8 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . 193<br />
A.9 Proof of Expression (5.22) . . . . . . . . . . . . . . . . . . 197<br />
A.10 Proof of Expression (5.27) . . . . . . . . . . . . . . . . . . 197<br />
A.11 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . 198<br />
A.12 Proof of Theorem 5.3 . . . . . . . . . . . . . . . . . . . . . 201<br />
A.13 Proof of Theorem 5.5 . . . . . . . . . . . . . . . . . . . . . 207<br />
A.14 Rank of Coefficient Matrix . . . . . . . . . . . . . . . . . . 208<br />
A.15 Rank of Sylvester Matrix . . . . . . . . . . . . . . . . . . 209<br />
B Notations 211<br />
C Abbreviations <strong>and</strong> Acronyms 215<br />
Bibliography 217
Chapter 1<br />
Introduction<br />
The subject of this thesis is so-called subspace parameter estimation<br />
methods. Two areas of application are considered. This implies that<br />
the thesis naturally divides <strong>in</strong>to two ma<strong>in</strong> parts. Part I considers the<br />
sensor array signal process<strong>in</strong>g problem <strong>and</strong> Part II is concerned with the<br />
identification of l<strong>in</strong>ear time-<strong>in</strong>variant dynamical systems. This chapter<br />
conta<strong>in</strong>s a short common <strong>in</strong>troduction to these two parts.<br />
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g<br />
The basic goal of sensor array signal process<strong>in</strong>g is to extract <strong>in</strong>formation<br />
from data obta<strong>in</strong>ed from multiple sensors. Consider the situation<br />
depicted <strong>in</strong> Figure 1.1. The emitters transmit energy (typically electromagnetic<br />
or acoustic) from different locations. The wavefield measured<br />
by the sensors conta<strong>in</strong>s <strong>in</strong>formation about, for example, the directions to<br />
the sources. By comb<strong>in</strong><strong>in</strong>g the <strong>in</strong>formation from the different sensors <strong>in</strong><br />
a clever way, one may obta<strong>in</strong> accurate estimates of the directions. Applications<br />
of sensor array techniques <strong>in</strong>clude: radar systems, underwater<br />
surveillance by hydrophones, seismology <strong>and</strong> radio or satellite communications.<br />
Currently, there is also a large <strong>in</strong>terest <strong>in</strong> us<strong>in</strong>g so-called smart<br />
antennas for mobile communication. The idea is to obta<strong>in</strong> diversity by<br />
mak<strong>in</strong>g use of multiple antennas.<br />
The research <strong>in</strong> the field of array signal process<strong>in</strong>g is by now quite<br />
mature. Over the last thirty years numerous articles have been published<br />
<strong>and</strong> there are excellent reviews of the obta<strong>in</strong>ed results (see, e.g., [KV96,
2 1 Introduction<br />
Source 1<br />
Source 2 Source d<br />
θ1<br />
θ2<br />
θd<br />
Array of m sensors<br />
Figure 1.1: The setup for the direction estimation problem.<br />
VB88, RA92, VDS93]). There are also several books written, <strong>in</strong>clud<strong>in</strong>g<br />
[Pil89, JD93, Hay91, SM97]. The motivation of this section is not to give<br />
a comprehensive <strong>and</strong> detailed <strong>in</strong>troduction (the curious reader is referred<br />
to the cited material), but rather to give a h<strong>in</strong>t on from where the research<br />
presented <strong>in</strong> the thesis starts.<br />
1.1.1 Data Model<br />
Consider aga<strong>in</strong> the situation <strong>in</strong> Figure 1.1. To develop a convenient<br />
mathematical model, one has to make several simplifications. A most<br />
important assumption made <strong>in</strong> the follow<strong>in</strong>g is that the source signals
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 3<br />
are narrowb<strong>and</strong>. This means that the signals have b<strong>and</strong>pass characteristics<br />
around a given center or carrier frequency. Such signals are common<br />
<strong>in</strong> communications where the <strong>in</strong>formative (low-frequency) signal is<br />
modulated by a high-frequency carrier before transmission. A complex<br />
representation of a narrowb<strong>and</strong> signal s(t) is given by<br />
s(t) = ¯s(t)e jωct ,<br />
where ωc is the carrier angular frequency. The narrowb<strong>and</strong> assumption<br />
implies that the amplitude or the envelope ¯s(t) varies much slower than<br />
s(t). Hence, for small time delays, τ,<br />
s(t − τ) = ¯s(t − τ)e jωc(t−τ)<br />
≈ ¯s(t)e jωct e −jωcτ<br />
= s(t)e −jωcτ .<br />
That is, for narrowb<strong>and</strong> signals we may model a small time delay by a<br />
phase shift.<br />
Let Hk(ω,θ) denote the (complex) response of the kth sensor at the<br />
frequency ω <strong>and</strong> from a wavefield <strong>in</strong> the direction θ. Furthermore, assume<br />
that Hk(ω,θ) ≈ Hk(ωc,θ) over the passb<strong>and</strong> around ωc of <strong>in</strong>terest. Let<br />
τk denote the time delay of the signal s(t) at the kth sensor relative to a<br />
reference po<strong>in</strong>t. Then the output of sensor k can be written as<br />
xk(t) = Hk(ωc,θ)s(t − τk) + nk(t) = Hk(ωc,θ)s(t)e −jωcτk + nk(t),<br />
(1.1)<br />
where θ is the direction to the source emitt<strong>in</strong>g the signal s(t). In (1.1),<br />
nk(t) is an additive noise term captur<strong>in</strong>g, sensor thermal noise, background<br />
noise, model errors etc..<br />
In essence, the narrowb<strong>and</strong> assumption is used to make ¯s(t) approximately<br />
constant dur<strong>in</strong>g the time it takes for the wave to propagate across<br />
the array <strong>and</strong> to approximate the sensor response by a constant.<br />
The sources are further assumed to be <strong>in</strong> the far field so that the<br />
signal wavefronts are planar when imp<strong>in</strong>g<strong>in</strong>g on the array. This makes it<br />
possible to relate the time delay to the direction-of-arrival θ. To see this,<br />
consider the follow<strong>in</strong>g example.<br />
Example 1.1. A common sensor array is the uniform l<strong>in</strong>ear array (ULA).<br />
In the ULA configuration, the sensors are distributed along a l<strong>in</strong>e with<br />
equal distance between the sensor elements. Figure 1.2 illustrates the
4 1 Introduction<br />
θ<br />
✉<br />
τc<br />
✉ θ ✉ ✉ ✉<br />
� �� �<br />
∆<br />
Figure 1.2: A plane wave from the direction θ imp<strong>in</strong>ges on a ULA of 5<br />
sensors separated by the distance ∆. The figure illustrates geometrically<br />
the propagation delay between two adjacent sensors.<br />
geometry when a plane wave imp<strong>in</strong>ges on a ULA with a separation of<br />
∆ between two adjacent sensors. Here, we consider the sensors <strong>and</strong> the<br />
sources to be <strong>in</strong> the same plane. In Figure 1.2, τ is the time it takes<br />
for the wavefront to propagate between two sensors <strong>and</strong> c is the speed of<br />
propagation. The signal imp<strong>in</strong>ges from the direction θ measured relative<br />
to the normal of the array. As shown <strong>in</strong> Figure 1.2, the relation between<br />
τ <strong>and</strong> θ is given by<br />
cτ = ∆s<strong>in</strong>(θ) .<br />
If there are d signals {sk(t)} d k=1 , then the total contribution at the ith<br />
sensor is (cf. (1.1))<br />
xi(t) =<br />
d�<br />
Hi(ωc,θk)sk(t)e jωc∆ s<strong>in</strong>(θk)/c<br />
+ ni(t),<br />
k=1<br />
where the first sensor is taken as the reference po<strong>in</strong>t. If the sensors are<br />
omni-directional, then Hi(ωc,θk) does not depend on θk. Furthermore, if<br />
the sensors are assumed to be identical, then the signal can be redef<strong>in</strong>ed<br />
to also <strong>in</strong>clude the sensor response. If the array comprises m sensors,
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 5<br />
then the outputs can be written <strong>in</strong> a vector,<br />
⎡ ⎤<br />
x1(t)<br />
⎢<br />
⎢x2(t)<br />
⎥<br />
x(t) = ⎢ .<br />
⎣ .<br />
⎥<br />
. ⎦<br />
xm(t)<br />
=<br />
⎡<br />
d� ⎢ e<br />
⎢<br />
⎣<br />
k=1<br />
e<br />
d�<br />
= a(θk)sk(t) + n(t)<br />
k=1<br />
1<br />
jωc∆ s<strong>in</strong>(θk)/c<br />
.<br />
jωc∆(m−1) s<strong>in</strong>(θk)/c<br />
= � a(θ1) ... a(θd) �<br />
⎡ ⎤<br />
s1(t)<br />
⎢ .<br />
⎣ .<br />
⎥<br />
. ⎦ + n(t)<br />
sd(t)<br />
= A(θ)s(t) + n(t),<br />
⎤<br />
⎥<br />
⎦ sk(t)<br />
⎡ ⎤<br />
n1(t)<br />
⎢<br />
⎢n2(t)<br />
⎥<br />
+ ⎢ .<br />
⎣ .<br />
⎥<br />
. ⎦<br />
nm(t)<br />
where θ = [θ1, θ2,..., θd] T . The vector a(θ) is often referred to as the<br />
array response or the steer<strong>in</strong>g vector <strong>in</strong> the direction θ, <strong>and</strong> the matrix<br />
A(θ) is called the steer<strong>in</strong>g matrix.<br />
Similar to the development <strong>in</strong> Example 1.1, the array output can <strong>in</strong><br />
general be modeled as<br />
x(t) = A(θ)s(t) + n(t) . (1.2)<br />
The parameter vector θ can be associated with several parameters for<br />
each signal, for example, elevation, azimuth, range, polarization <strong>and</strong> carrier<br />
frequency. Typically, we will simplify matters <strong>and</strong> only consider one<br />
parameter per signal, for example, the azimuth (the direction-of-arrival)<br />
as <strong>in</strong> the example above.<br />
In the receiver equipment of the sensor array, the modulated signals<br />
<strong>in</strong> (1.2) are demodulated. The demodulation process extracts the (lowfrequent)<br />
baseb<strong>and</strong> signal that conta<strong>in</strong>s the <strong>in</strong>formation. By a slight abuse<br />
of notation, we still use the model (1.2) for the demodulated signals. Even<br />
though the derivation of the model above is somewhat sloppy, the idea<br />
should be clear. The <strong>in</strong>terested reader is referred to [Tre71] for more<br />
details on the complex signal representation <strong>and</strong> to [Vib89, Ott89, SM97]<br />
for derivations of the array model.<br />
Equation (1.2) constitutes one vector sample of the array output; that<br />
is, the m array outputs are sampled simultaneously. <strong>On</strong>e vector sample<br />
or one observation of the vector process {x(t)} is sometimes called a
6 1 Introduction<br />
snapshot of the array output. Assume that N snapshots are available,<br />
the data may then be written <strong>in</strong> matrix form as<br />
XN = � x(1), ..., x(N) � = A(θ)SN + NN .<br />
Here, SN <strong>and</strong> NN are the collection of signal <strong>and</strong> noise vectors, respectively,<br />
def<strong>in</strong>ed analogously to XN. The objective of the sensor array<br />
estimation problem can be stated as follows:<br />
Based on the data, XN, <strong>and</strong> the model for the observations,<br />
(1.2); estimate the number of signals, d, the signal parameters,<br />
θ, the signal waveforms, SN (or the signal covariance matrix<br />
P = E{s(t)s ∗ (t)}), <strong>and</strong> the noise variance, σ 2 . (See the next<br />
section for the stochastic model assumptions.)<br />
The major concern <strong>in</strong> this thesis is the estimation of the signal parameters,<br />
θ. <strong>On</strong>ce θ is known, it is relatively straightforward to estimate the<br />
other parameters of <strong>in</strong>terest (see, e.g., [Böh86, ORK89]). The number<br />
of signals, d, is assumed to be known. Literature address<strong>in</strong>g the estimation<br />
of d is, for example, [Fuc92, Wax91, VOK91b, VOK91a] <strong>and</strong> the<br />
references there<strong>in</strong>.<br />
1.1.2 Estimation <strong>Methods</strong><br />
The early approaches to make use of the <strong>in</strong>formation obta<strong>in</strong>ed by the<br />
sensor array <strong>in</strong>volved so-called beamform<strong>in</strong>g (see [VB88]). In short, these<br />
methods form a weighted sum of the outputs of the array elements to give<br />
a measure of the received energy from different directions. This gives a<br />
spatial spectrum from which θ can be estimated. These methods can be<br />
classified as “non-parametric” s<strong>in</strong>ce they do not make any assumptions<br />
on the covariance of the data [SM97]. In the rema<strong>in</strong>der of this section,<br />
some of the “parametric” array process<strong>in</strong>g methods will be described.<br />
The methods to be presented are only those that serve as a background<br />
to the material <strong>in</strong> the thesis.<br />
Maximum Likelihood<br />
The maximum likelihood (ML) method is a powerful tool for parameter<br />
estimation. It allows for a systematic procedure to many estimation<br />
problems. The ML estimates have, under mild conditions, strong statistical<br />
properties. For example, if the ML estimate is consistent 1 then<br />
1 A parameter estimate is said to be (strongly) consistent if it converges to the true<br />
value with probability one as the number of observations tend to <strong>in</strong>f<strong>in</strong>ity.
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 7<br />
it is also asymptotically statistically efficient. Here, asymptotically statistically<br />
efficient means that the ML estimator atta<strong>in</strong>s the Cramér-Rao<br />
lower bound (CRB) asymptotically <strong>in</strong> the number of samples. The CRB<br />
is a lower bound on the estimation error variance for any unbiased estimator<br />
[Cra46].<br />
A common assumption on the noise, n(t), is that it is a zero-mean<br />
stationary complex Gaussian r<strong>and</strong>om process [Goo63] with second order<br />
moments<br />
E{n(t)n ∗ (s)} = σ 2 I δt,s , E{n(t)n T (s)} = 0 .<br />
Here, the superscript ∗ denotes complex conjugate transpose, δt,s is the<br />
Kronecker delta <strong>and</strong> I denotes the identity matrix. Regard<strong>in</strong>g the signal<br />
waveforms, {s(t)}, two common assumptions exist. Sometimes {s(t)} are<br />
treated as completely unknown complex parameters, called the determ<strong>in</strong>istic<br />
approach. However, for estimator design purposes it has been shown<br />
that a stochastic model for the signals may be useful, whether or not the<br />
true signals are stochastic <strong>in</strong> nature. In what follows, the signal vector<br />
s(t) is assumed to be a zero-mean stationary complex Gaussian r<strong>and</strong>om<br />
process with second order moments<br />
E{s(t)s ∗ (s)} = P δt,s , E{s(t)s T (s)} = 0 .<br />
Under the assumption that the noise <strong>and</strong> the emitter signals are uncorrelated,<br />
the array output is complex Gaussian with zero-mean <strong>and</strong><br />
covariance matrix<br />
R = E{x(t)x ∗ (t)} = A(θ)PA ∗ (θ) + σ 2 I . (1.3)<br />
The assumption of Gaussian emitter signals may be commented upon.<br />
In [OVK90, SN90] it is shown that the exact distribution is immaterial;<br />
the important quantity that determ<strong>in</strong>es the asymptotic properties of the<br />
signal parameter estimates is limN→∞ SNS ∗ N /N.<br />
From the assumptions on the signals <strong>and</strong> noise, the array response<br />
data, x(t), will be complex m-variate Gaussian with zero-mean <strong>and</strong> the<br />
covariance matrix given <strong>in</strong> (1.3). The probability density function is given<br />
by [Goo63]<br />
fX(x(t)) =<br />
1<br />
π m det{R} e−x∗ (t)R −1 x(t) , (1.4)<br />
where det{R} denotes the determ<strong>in</strong>ant of R. The distribution depends on<br />
the unknown parameters <strong>in</strong> θ, P <strong>and</strong> σ 2 . Introduce the parameter vector
8 1 Introduction<br />
p with the d 2 real-valued parameters <strong>in</strong> the Hermitian signal covariance<br />
matrix P; that is,<br />
p = � �T<br />
P11, ..., Pdd, P12, ¯ P12, ˜ ... P(d−1)d, ¯ P(d−1)d ˜ ,<br />
where Pij denotes the (i,j)th element of P <strong>and</strong> where ā = Re{a} <strong>and</strong><br />
ã = Im{a}. Furthermore, let η be the vector of all unknown parameters<br />
η = � θ T , p T , σ 2�T .<br />
The probability density function of one observation x(t) is given by (1.4).<br />
S<strong>in</strong>ce the snapshots are <strong>in</strong>dependent <strong>and</strong> identically distributed, the jo<strong>in</strong>t<br />
probability density of the N observations is<br />
f(x(1), ..., x(N)|η) =<br />
N�<br />
t=1<br />
1<br />
π m det{R} e−x∗ (t)R −1 x(t) . (1.5)<br />
Equation (1.5) is a measure of the probability that the outcome should<br />
take on the value x(1), ..., x(N) given η. When the observed values of<br />
x(t) are <strong>in</strong>serted <strong>in</strong> (1.5), a function of η is obta<strong>in</strong>ed. This function is<br />
called the likelihood function <strong>and</strong> is denoted by f(η). The ML-estimate<br />
of η is given by the maximiz<strong>in</strong>g argument of f(η); that is, the value that<br />
makes the likelihood of the outcome as large as possible. Maximiz<strong>in</strong>g<br />
f(η) is equivalent to m<strong>in</strong>imiz<strong>in</strong>g the negative logarithm of f(η),<br />
−log{f(η)} = −<br />
N�<br />
t=1<br />
log � 1<br />
π m det{R} e−x∗ (t)R −1 x(t) �<br />
= mN log(π) + N log(det{R}) +<br />
N�<br />
t=1<br />
x ∗ (t)R −1 x(t) .<br />
Ignor<strong>in</strong>g the constant term <strong>and</strong> normaliz<strong>in</strong>g by N, the criterion function<br />
is given by<br />
l(η) = log(det{R}) + 1<br />
N<br />
= log(det{R}) + 1<br />
N<br />
N�<br />
t=1<br />
N�<br />
t=1<br />
= log(det{R}) + Tr{R −1 ˆ R} ,<br />
x ∗ (t)R −1 x(t)<br />
Tr{R −1 x(t)x ∗ (t)}<br />
(1.6)
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 9<br />
where Tr{·} is the trace operator [Gra81] <strong>and</strong> ˆ R is the sample covariance<br />
matrix,<br />
ˆR = 1<br />
N<br />
N�<br />
x(t)x ∗ (t) .<br />
t=1<br />
The ML estimate is thus given by the follow<strong>in</strong>g optimization problem<br />
ˆη = arg m<strong>in</strong><br />
η<br />
l(η) . (1.7)<br />
This is a high-dimensional nonl<strong>in</strong>ear optimization problem with, at least,<br />
d 2 + d + 1 parameters. It is possible to decrease this dimension through<br />
a separation of the problem as shown <strong>in</strong> [Böh86, Jaf88]. There<strong>in</strong>, Equation<br />
(1.7) is solved explicitly with respect to the signal <strong>and</strong> noise covariance<br />
parameters. For a given θ, this results <strong>in</strong><br />
ˆP(θ) = A † (θ) � ˆ R − ˆσ 2 (θ)I � A † ∗<br />
(θ), (1.8)<br />
ˆσ 2 (θ) = 1<br />
m − d Tr{Π⊥ A(θ) ˆ R}, (1.9)<br />
where the notations A † = (A ∗ A) −1 A ∗ <strong>and</strong> Π ⊥ A = I − AA † are <strong>in</strong>troduced.<br />
If (1.8) <strong>and</strong> (1.9) are substituted back <strong>in</strong> (1.6), the follow<strong>in</strong>g<br />
concentrated criterion function is obta<strong>in</strong>ed (after some algebra)<br />
V (θ) = log<br />
<strong>and</strong> the ML-estimate of the DOAs are<br />
�<br />
det � A(θ) ˆ P(θ)A ∗ (θ) + ˆσ 2 (θ)I ��<br />
ˆθ = arg m<strong>in</strong><br />
θ V (θ) . (1.10)<br />
Then, the problem is reduced to a d-dimensional non-l<strong>in</strong>ear search over<br />
the DOAs <strong>in</strong> the vector θ. The ML-estimates of P <strong>and</strong> σ 2 are obta<strong>in</strong>ed<br />
by <strong>in</strong>sert<strong>in</strong>g (1.10) <strong>in</strong> (1.8) <strong>and</strong> (1.9), respectively.<br />
We will not further elaborate on the direct ML approach, but the<br />
presentation above serves as comparison <strong>and</strong> background to the methods<br />
used <strong>in</strong> the preced<strong>in</strong>g sections. A f<strong>in</strong>al remark about the derivations<br />
above concerns the name ML. The presentation given here results <strong>in</strong><br />
an ML-estimator sometimes called the stochastic or unconditional ML.<br />
There also exists a different ML-estimator called the determ<strong>in</strong>istic or the<br />
conditional ML. This is due to the two different choices of model for the<br />
emitter signals mentioned above. For a discussion about the differences<br />
between these approaches, see [SN90, OVSN93].
10 1 Introduction<br />
<strong>Subspace</strong> <strong>Methods</strong><br />
S<strong>in</strong>ce the <strong>in</strong>troduction of an eigenstructure technique by [Pis73] for the<br />
harmonic retrieval problem, there has been great <strong>in</strong>terest <strong>in</strong> such methods<br />
<strong>in</strong> the signal process<strong>in</strong>g literature. However, the major break-po<strong>in</strong>t<br />
came a few years later with the development of the MUSIC (multiple signal<br />
classification) algorithm by Schmidt [Sch81, Sch79] (see also [Bie79]).<br />
The eigenstructure or subspace methods try to exploit a geometrical formulation<br />
of the problem by utiliz<strong>in</strong>g the eigendecomposition of the array<br />
covariance matrix. Here, we just mention that the ideas of the subspace<br />
methods are closely tied to pr<strong>in</strong>cipal component <strong>and</strong> factor analysis<br />
[Hot33, Law40]. The popularity of the subspace methods is due to the<br />
fact that they have a much better resolution compared to classical beamform<strong>in</strong>g<br />
techniques. That is, they have a good ability to resolve sources<br />
that are close <strong>in</strong> space (angle). The ML method presented <strong>in</strong> the previous<br />
section also has a high resolution, but it is computationally more <strong>in</strong>volved<br />
than the subspace methods. Three well known subspace methods will be<br />
described below. First, a description of the eigendecomposition of the<br />
array output covariance matrix is needed.<br />
Consider the covariance matrix<br />
R = A(θ)PA ∗ (θ) + σ 2 I .<br />
Recall that there are m sensors <strong>and</strong> d signals. Let d ′ denote the rank<br />
of P. If d ′ < d, then this <strong>in</strong>dicates that some of the signals are fully<br />
correlated or coherent 2 . Let the eigendecomposition of R be<br />
R =<br />
m�<br />
i=1<br />
λieie ∗ i , (1.11)<br />
where the eigenvalues are assumed to be ordered as λ1 ≥ λ2 ≥ · · · ≥ λm.<br />
Notice that, s<strong>in</strong>ce R is Hermitian, the eigenvectors can be chosen orthonormal<br />
(e ∗ i ek = δi,k). The rank of A(θ)PA ∗ (θ) is d ′ which implies<br />
that there are m −d ′ eigenvectors <strong>in</strong> its nullspace. These eigenvectors all<br />
have a correspond<strong>in</strong>g eigenvalue equal to σ 2 . The rema<strong>in</strong><strong>in</strong>g d ′ eigenvalues<br />
will be larger than the noise variance. With the observations made,<br />
the relation (1.11) may be partitioned as<br />
R = EsΛsE ∗ s + EnΛnE ∗ n ,<br />
2 Two signals are coherent if the first signal is a scaled (by a complex constant)<br />
version of the other.
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 11<br />
where Λs = diag[λ1, ..., λd ′] <strong>and</strong> Λn = diag[λd ′ +1, ..., λm] = σ 2 I.<br />
The notation diag[·] is used to denote a diagonal matrix with specified<br />
diagonal elements. The matrix Es is m × d ′ <strong>and</strong> conta<strong>in</strong>s the d ′ “signal”<br />
eigenvectors correspond<strong>in</strong>g to the eigenvalues <strong>in</strong> Λs. Similarly, En is<br />
m × (m − d ′ ) <strong>and</strong> consists of the “noise” eigenvectors correspond<strong>in</strong>g to<br />
Λn. The follow<strong>in</strong>g geometrical observation is at the heart of all subspace<br />
methods<br />
R{Es} ⊆ R{A(θ)} , (1.12)<br />
where R{·} denotes the range of the matrix <strong>in</strong> question. The relation<br />
(1.12) says that the subspace spanned by the columns of Es is conta<strong>in</strong>ed<br />
<strong>in</strong> or is equal to the subspace spanned by the columns of A(θ). Equality<br />
<strong>in</strong> (1.12) holds if <strong>and</strong> only if there are no coherent signals (d ′ = d). If<br />
d ′ = d, then (1.12) implies that<br />
N {E ∗ n} = R{A(θ)} , (1.13)<br />
where N {E ∗ n} denotes the null-space of E ∗ n. That is, the columns of<br />
A(θ) spans the null-space of E ∗ n. This observation forms the basis of the<br />
MUSIC algorithm presented next.<br />
MUSIC<br />
Let a(θ) denote a generic column of A(θ). In view of (1.13),<br />
a ∗ (θ)EnE ∗ na(θ) = 0 (1.14)<br />
for θ = θk (k = 1,...,d), where {θk} are the true DOAs. In the MUSIC algorithm,<br />
the noise eigenvectors are estimated from the sample covariance<br />
matrix <strong>and</strong> the function<br />
1<br />
a ∗ (θ) Ên Ê∗ na(θ)<br />
is computed over the sector of <strong>in</strong>terest. The DOAs are estimated as the<br />
d highest peaks. For uniform l<strong>in</strong>ear arrays, 3 MUSIC can be implemented<br />
<strong>in</strong> a “root<strong>in</strong>g” form (root-MUSIC) [Bar83]. In that case, the solutions to<br />
(1.14) are computed from the roots of a polynomial. The performance of<br />
MUSIC is <strong>in</strong> many cases excellent. However, for scenarios with strongly<br />
correlated <strong>and</strong> closely spaced emitters or for small samples it may be far<br />
from optimal.<br />
3 Actually, this can be easily extended to hold for all l<strong>in</strong>ear arrays where the distances<br />
between the sensors are rational.
12 1 Introduction<br />
ESPRIT<br />
ESPRIT (estimation of signal parameters by rotational <strong>in</strong>variance techniques)<br />
[PRK86, RPK86, RK89] (see also [Kun81, KAB83]) exploits the<br />
geometry of a particular type of arrays. It is assumed that the array<br />
consists of two identical subarrays displaced by a known distance <strong>and</strong> <strong>in</strong><br />
a known direction. A simple example of such an array is the ULA (see<br />
Example 1.1). Def<strong>in</strong>e<br />
∆<br />
φk = ωc<br />
c s<strong>in</strong>(θk) (k = 1,...,d), (1.15)<br />
A1 = � Im−1 0 � A,<br />
A2 = � �<br />
0 Im−1 A,<br />
where 0 is an (m − 1) × 1 vector of zeros, <strong>and</strong> notice that<br />
where<br />
A2 = A1Φ, (1.16)<br />
⎡<br />
e<br />
⎢<br />
Φ = ⎣<br />
jφ1 . ..<br />
0<br />
0 ejφd ⎤<br />
⎥<br />
⎦.<br />
Clearly, the eigenvalues of Φ determ<strong>in</strong>e the DOAs (under some identifiability<br />
conditions not discussed here). If d ′ = d then the relation (1.12)<br />
implies that there exists a nons<strong>in</strong>gular matrix T such that A = EsT. By<br />
def<strong>in</strong><strong>in</strong>g<br />
Es1 = � Im−1 0 � Es,<br />
Es2 = � �<br />
0 Im−1 Es,<br />
<strong>and</strong> mak<strong>in</strong>g use of (1.16), the follow<strong>in</strong>g relation is obta<strong>in</strong>ed:<br />
Es2<br />
= Es1 Ψ, (1.17)<br />
where Ψ = TΦT −1 . Observe that the matrices Φ <strong>and</strong> Ψ have the<br />
same eigenvalues s<strong>in</strong>ce they are related by a similarity transformation.<br />
Hence, given an estimate of Es, one can solve (1.17) to give an estimate<br />
of Ψ. The directions can then be estimated through the relation between<br />
(the angles of) the eigenvalues of Ψ <strong>and</strong> the DOAs given above. The<br />
performance of ESPRIT is <strong>in</strong> most cases similar to the one for MUSIC.
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 13<br />
Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g<br />
In [VO91], a general framework for subspace methods is presented. It is<br />
shown that many of the exist<strong>in</strong>g subspace-based techniques can be formulated<br />
as a subspace fitt<strong>in</strong>g problem. Indeed, the relation (1.12) <strong>in</strong>dicates<br />
that, given an estimate of the “signal subspace”, Ês, the parameters θ<br />
should be chosen such that R{ Ês} is conta<strong>in</strong>ed <strong>in</strong> R{A(θ)}. (It can be<br />
shown that this uniquely determ<strong>in</strong>es θ under weak conditions [WZ89].)<br />
In the presence of noise, it may not be possible to f<strong>in</strong>d an exact solution<br />
<strong>and</strong> a measure of closeness has to be specified. The signal subspace fitt<strong>in</strong>g<br />
formulation relies on a criterion which gives the best weighted least<br />
squares fit of the subspaces <strong>in</strong> (1.12). Thus, consider the criterion<br />
m<strong>in</strong><br />
θ, T �ÊsW 1/2 − A(θ)T� 2 F , (1.18)<br />
where �A� 2 F = Tr{AA∗ } denotes the square of the Frobenius matrix<br />
norm, Ês is an estimate of Es, <strong>and</strong> T is an arbitrary d × d ′ matrix.<br />
The weight<strong>in</strong>g matrix W 1/2 is d ′ × d ′ <strong>and</strong> assumed to be Hermitian <strong>and</strong><br />
positive def<strong>in</strong>ite. It is possible to explicitly solve (1.18) with respect to<br />
T. The solution is given by ˆ T = (A ∗ A) −1 A ∗ ÊsW 1/2 = A † ÊsW 1/2<br />
where the argument, θ, has been suppressed for notational convenience.<br />
Insert<strong>in</strong>g the solution for T <strong>in</strong> (1.18), yields<br />
V (θ) = � ÊsW 1/2 − AA † ÊsW 1/2 � 2 F<br />
= �Π ⊥ A ÊsW 1/2 � 2 F = Tr{Π ⊥ A ÊsW Ê∗ s} ,<br />
(1.19)<br />
where the facts that Π ⊥ AΠ ⊥ A = Π ⊥ A <strong>and</strong> W 1/2 W ∗/2 = W are used. The<br />
estimate of θ is obta<strong>in</strong>ed as the m<strong>in</strong>imiz<strong>in</strong>g argument of V (θ),<br />
ˆθ = arg m<strong>in</strong><br />
θ V (θ) . (1.20)<br />
As shown <strong>in</strong> [VO91], different choices of W lead to a whole class of<br />
estimators. It is also shown that there exists a specific weight<strong>in</strong>g which<br />
m<strong>in</strong>imizes the asymptotic parameter estimation error covariance matrix<br />
(<strong>in</strong> terms of matrix <strong>in</strong>equalities). The optimal weight<strong>in</strong>g is given by<br />
WWSF = (Λs − σ 2 I) 2 Λ −1<br />
s = ˜ Λ 2<br />
sΛ −1<br />
s .<br />
With this specific choice of weight<strong>in</strong>g, the algorithm is usually referred to<br />
as the weighted subspace fitt<strong>in</strong>g (WSF) method. Furthermore, it is shown
14 1 Introduction<br />
<strong>in</strong> [VO91] that the asymptotic covariance atta<strong>in</strong>s the Cramér-Rao lower<br />
bound <strong>and</strong>, hence, WSF is asymptotically statistically efficient. This<br />
implies that WSF <strong>and</strong> ML have the same asymptotic performance.<br />
The solution of (1.20) requires a nonl<strong>in</strong>ear optimization <strong>in</strong> d dimensions<br />
(if there is only one parameter associated with each signal). However,<br />
due to the form of the criterion, the implementation of WSF can<br />
be made more efficient than the implementation of ML [OVSN93].<br />
In [SS90a], the WSF criterion (1.19) is derived <strong>in</strong> a different way than<br />
described here. In that paper, the criterion is obta<strong>in</strong>ed by a maximum<br />
likelihood approach based on the statistics of the sample eigenvectors, <strong>and</strong><br />
the method is referred to as MODE (method of direction estimation).<br />
For uniform l<strong>in</strong>ear arrays with identical omni-directional sensors, the<br />
m<strong>in</strong>imization <strong>in</strong> (1.20) can be implemented <strong>in</strong> a very appeal<strong>in</strong>g manner<br />
[SS90b, SS90a]. The simplification is possible through a l<strong>in</strong>ear reparameterization<br />
of the nullspace of A ∗ . See Section 2.4 for some more details<br />
on this.<br />
1.1.3 Discussion<br />
In this section, the different topics of the thesis are related to the background<br />
material presented above.<br />
Chapter 2 concerns the application of different DOA estimators to the<br />
forward-backward (FB) sample covariance matrix of the output from a<br />
ULA. For the current discussion, it suffices to say that the FB approach<br />
uses the reflectional symmetry of the ULA to, <strong>in</strong> pr<strong>in</strong>ciple, “decorrelate”<br />
the signals (see Chapter 2 for a more detailed description). By this operation,<br />
it is expected that estimation methods that work well <strong>in</strong> scenarios<br />
with low signal correlation should improve their performance when applied<br />
to the FB preprocessed covariance matrix. To show that this <strong>in</strong>deed<br />
may be the case, consider the follow<strong>in</strong>g example.<br />
Example 1.2. Consider a ULA consist<strong>in</strong>g of six omni-directional <strong>and</strong><br />
identical sensors separated by half of the carrier’s wavelength. The two<br />
signals imp<strong>in</strong>ge on the array from θ1 = −7.5 ◦ <strong>and</strong> θ2 = 7.5 ◦ relative to<br />
broadside. The signal covariance is<br />
�<br />
P =<br />
1 0.8e iπ/4<br />
0.8e −iπ/4 1<br />
The noise variance is fixed to σ 2 = 1. The DOAs are estimated <strong>in</strong> four<br />
different ways; by apply<strong>in</strong>g the root-MUSIC <strong>and</strong> ESPRIT methods to either<br />
the st<strong>and</strong>ard sample covariance matrix, or to the forward-backward<br />
�<br />
.
1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 15<br />
RMS error (<strong>in</strong> degrees)<br />
10 2<br />
10 1<br />
10 0<br />
10 −1<br />
10 1<br />
10 2<br />
Number of snapshots, N.<br />
Figure 1.3: Root-mean-square errors for θ1 (<strong>in</strong> degrees) for root-MUSIC<br />
(*), FB-MUSIC (o), ESPRIT (x) <strong>and</strong> FB-ESPRIT (+) versus the number<br />
of snapshots N. The solid l<strong>in</strong>e represents the CRB.<br />
sample covariance. The acronyms FB-MUSIC <strong>and</strong> FB-ESPRIT will be<br />
used when root-MUSIC <strong>and</strong>, respectively, ESPRIT are applied to the<br />
eigenvectors of the FB sample covariance matrix. The root-mean-square<br />
(RMS) errors for the estimates obta<strong>in</strong>ed by the four methods are compared<br />
for different sample lengths <strong>in</strong> Figure 2.1. (<strong>On</strong>ly the RMS values<br />
for θ1 are shown; the plot correspond<strong>in</strong>g to θ2 is similar.) The sample<br />
RMS values are based on 500 <strong>in</strong>dependent trials. From the figure, it<br />
can be seen that the FB approach gives a significant improvement <strong>in</strong><br />
performance for both MUSIC <strong>and</strong> ESPRIT.<br />
While the performance of MUSIC <strong>and</strong> ESPRIT can be enhanced by<br />
us<strong>in</strong>g the FB approach, it is shown <strong>in</strong> Chapter 2 that this is not true<br />
for WSF, or equivalently, MODE. <strong>On</strong> the contrary, <strong>in</strong> general the performance<br />
of WSF/MODE degrades when applied to the FB sample covariance.<br />
Hence, the conclusion of Chapter 2 is that the FB approach should<br />
10 3
16 1 Introduction<br />
not be used with a statistically efficient method such as WSF/MODE.<br />
A number of idealizations <strong>and</strong> assumptions were made when deriv<strong>in</strong>g<br />
the array model. In practice these may be more or less <strong>in</strong>valid. For<br />
example, the array model depends on the exact knowledge of the sensor<br />
positions, the ga<strong>in</strong> <strong>and</strong> phase of the <strong>in</strong>dependent sensors etc.. Sometimes<br />
the array response is obta<strong>in</strong>ed by so-called array calibration. This simply<br />
consists of measur<strong>in</strong>g the response of the array when it is excited by<br />
signals <strong>in</strong> known locations. Needless to say, this does not give a perfect<br />
error-free model. In Chapter 3, a method that takes small model errors<br />
<strong>in</strong>to account when estimat<strong>in</strong>g the directions of arrival is proposed. By<br />
compensat<strong>in</strong>g for the model errors <strong>in</strong> the estimator, a more robust method<br />
is obta<strong>in</strong>ed.<br />
It is well known that the MUSIC estimator performs well when the<br />
emitter signals are uncorrelated. Often it is even considered to be close<br />
to optimal <strong>in</strong> that case. However, it seems mean<strong>in</strong>gful to ask: What is<br />
the best possible performance that can be obta<strong>in</strong>ed if it is a priori known<br />
that the signals are uncorrelated? In Chapter 4, the correspond<strong>in</strong>g CRB<br />
is derived. As expected, that CRB is a lower bound than the bound given<br />
by the “st<strong>and</strong>ard” CRB, where the signal correlation is considered to be<br />
arbitrary. This implies that if one can derive a method that optimally<br />
exploits the a priori knowledge that the emitter signals are uncorrelated,<br />
then it will perform better than, for example, MUSIC <strong>and</strong> WSF. <strong>On</strong>e<br />
efficient method is maximum likelihood (ML). The ML estimate for the<br />
current problem is obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the criterion <strong>in</strong> (1.6) under<br />
the constra<strong>in</strong>t that the signal covariance is diagonal. However, this seems<br />
to be a quite complicated m<strong>in</strong>imization problem of high dimension (s<strong>in</strong>ce<br />
the unknowns are the DOAs, the diagonal elements of P <strong>and</strong> σ 2 ). A second<br />
alternative is to use the generalized least squares (GLS) method (also<br />
known as the method of moments or the covariance match<strong>in</strong>g method)<br />
<strong>in</strong> which the sample <strong>and</strong> the model covariance are matched <strong>in</strong> a weighted<br />
least squares sense. The GLS method is known to be asymptotically statistically<br />
efficient for the case under study (see [And84] <strong>and</strong> the references<br />
there<strong>in</strong>). Us<strong>in</strong>g the GLS method on the problem at h<strong>and</strong>, it is possible to<br />
separate the criterion so that the direction estimates can be obta<strong>in</strong>ed as<br />
the result of a d-dimensional non-l<strong>in</strong>ear m<strong>in</strong>imization. Hence, compared<br />
to ML, this leads to a less complicated optimization problem. In Chapter<br />
4, we propose a novel method for direction estimation <strong>in</strong> cases when it<br />
is a priori known that the emitter signals are uncorrelated. The method<br />
is shown to be asymptotically statistically efficient. Furthermore, the<br />
method allows for a relatively simple implementation when the array is
1.2 <strong>System</strong> <strong>Identification</strong> 17<br />
w(t) v(t)<br />
❄<br />
✛✘ ❄<br />
u(t) ✲ <strong>System</strong> ✲ Σ ✲<br />
✚✙<br />
Figure 1.4: L<strong>in</strong>ear f<strong>in</strong>ite dimensional time-<strong>in</strong>variant dynamic system.<br />
y(t)<br />
uniform <strong>and</strong> l<strong>in</strong>ear. Then, the estimates are obta<strong>in</strong>ed by us<strong>in</strong>g st<strong>and</strong>ard<br />
matrix operations (like an eigendecomposition) only. Hence, there is no<br />
need for an iterative search procedure.<br />
More precise descriptions of the contributions of the thesis are given<br />
<strong>in</strong> Section 1.3.<br />
1.2 <strong>System</strong> <strong>Identification</strong><br />
The second part of this thesis concerns subspace system identification.<br />
<strong>System</strong> identification deals with the topic of obta<strong>in</strong><strong>in</strong>g mathematical<br />
models of dynamic systems from measured data. The applications of<br />
system identification techniques are widespread <strong>in</strong> almost all eng<strong>in</strong>eer<strong>in</strong>g<br />
discipl<strong>in</strong>es. The theory is firmly established <strong>and</strong> covered <strong>in</strong> several books<br />
<strong>in</strong>clud<strong>in</strong>g [Boh91, Eyk74, GP77, HD88, Joh93, Lju87, SS89].<br />
1.2.1 Background<br />
Consider the situation <strong>in</strong> Figure 1.4. In the figure, u(t) <strong>and</strong> y(t) are<br />
observed <strong>in</strong>put <strong>and</strong> output signals, respectively. The signals w(t) <strong>and</strong><br />
v(t) represent unknown <strong>in</strong>puts to the system. All signals may very well<br />
be vector valued. Assume that the system is l<strong>in</strong>ear <strong>and</strong> time-<strong>in</strong>variant<br />
<strong>and</strong>, hence, that the observed output can be described as<br />
y(t) = G(q)u(t) + H(q)e(t). (1.21)
18 1 Introduction<br />
In (1.21), the transfer operators or the transfer functions H(q) <strong>and</strong> G(q)<br />
are functions of the forward shift operator q (qu(t) = u(t+1)). Moreover,<br />
H(q) is monic, which means that its first impulse response coefficient is<br />
the identity, I. The part H(q)e(t), where {e(t)} is a white noise process,<br />
encompasses the effect of both w(t) <strong>and</strong> v(t) <strong>in</strong> Figure 1.4.<br />
<strong>System</strong> identification aims at estimat<strong>in</strong>g the transfer functions H(q)<br />
<strong>and</strong> G(q) from measurements of u(t) <strong>and</strong> y(t) at the time <strong>in</strong>stances<br />
t = 1,2,...,N. A very general approach which has proved successful is<br />
the prediction error method (PEM). The idea of PEM is to m<strong>in</strong>imize the<br />
difference between the measured output y(t) <strong>and</strong> the prediction of y(t)<br />
based on past data. The best l<strong>in</strong>ear one-step ahead predictor is given by<br />
[Lju87]<br />
ˆy(t,θ) = H −1 (q,θ)G(q,θ)u(t) + � I − H −1 (q,θ) � y(t),<br />
where G(q,θ) <strong>and</strong> H(q,θ) are parameterized models of G(q) <strong>and</strong> H(q),<br />
respectively. The PEM estimate of θ is def<strong>in</strong>ed by<br />
ˆθ = arg m<strong>in</strong><br />
θ V (θ),<br />
where a typical choice of the criterion is<br />
V (θ) = 1<br />
N<br />
N�<br />
[y(t) − ˆy(t,θ)] T W[y(t) − ˆy(t,θ)]<br />
t=1<br />
for a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix W.<br />
It should be remarked that the use of PEM is not limited to the case<br />
where there exists a “true system” represented by (1.21). The system<br />
can be of higher dimension than the model <strong>and</strong> it can be non-l<strong>in</strong>ear. The<br />
identified model will then be an approximation of the system with certa<strong>in</strong><br />
properties [Lju87, SS89].<br />
In general, the m<strong>in</strong>imization of the prediction errors requires a nonl<strong>in</strong>ear<br />
iterative search. The success of such procedures depends heavily on<br />
if there exists a sufficiently good <strong>in</strong>itial estimate. It is also important to<br />
have a numerically reliable (canonical) parameterization of G(q,θ) <strong>and</strong><br />
H(q,θ). This is an issue that is most difficult for multi-variable systems<br />
(see, for example, [VL82, Lju87, GW84, GW74]).<br />
The subspace identification methods have <strong>in</strong> many cases shown to<br />
give accurate models. Those models can either be used as they are, or<br />
they may serve as an <strong>in</strong>itial estimate for PEM to give a ref<strong>in</strong>ed model.<br />
There exists alternative methods to provide <strong>in</strong>itial model estimates, such
1.2 <strong>System</strong> <strong>Identification</strong> 19<br />
as, <strong>in</strong>strumental variable <strong>and</strong> two-stage least-squares methods [Lju87,<br />
SS89]. <strong>On</strong>e reason for the <strong>in</strong>terest <strong>in</strong> the subspace methods is due to<br />
computational aspects. Many subspace methods can be implemented <strong>in</strong><br />
a very effective way by mak<strong>in</strong>g use of numerical l<strong>in</strong>ear algebra tools such<br />
as QR-factorization <strong>and</strong> s<strong>in</strong>gular value decomposition [CXK94, CK95,<br />
VD91, VD96].<br />
1.2.2 <strong>Subspace</strong> <strong>Methods</strong><br />
<strong>Subspace</strong> methods for system identification have their orig<strong>in</strong> <strong>in</strong> the realization<br />
theory (see, for example, [HK66, Aka74a, Aka74b, Fau76]).<br />
The determ<strong>in</strong>istic realization algorithm of Ho <strong>and</strong> Kalman [HK66] was<br />
turned <strong>in</strong>to an identification algorithm <strong>in</strong> [ZM74, Kun78] where the s<strong>in</strong>gular<br />
value decomposition (SVD) was used as a tool to suppress the<br />
effects of noise. Later extensions <strong>in</strong>clude [Bay92, KDS88, LS91, Lju91,<br />
JP85]. These approaches are based on impulse response data. Often<br />
it is difficult to get accurate estimates of the impulse response coefficients<br />
from data. The subspace methods considered here directly utilize<br />
the measured time doma<strong>in</strong> data. Some basic approaches <strong>in</strong>clude<br />
[Gop69, De 88, DVVM88, MDVV89, Ver91]. More general methods are<br />
presented <strong>in</strong> [Lar83, VD94b, Ver94, PSD96]. These methods consider<br />
comb<strong>in</strong>ed determ<strong>in</strong>istic-stochastic systems where the output is due to<br />
both observed <strong>and</strong> unobserved <strong>in</strong>puts. There are also subspace methods<br />
for the pure time series case [Aka75, Aok87, AK90, VD93, Lar83, DPS95]<br />
<strong>and</strong> for the errors-<strong>in</strong>-variables problem [SCE95, CS96]. However, the<br />
ma<strong>in</strong> <strong>in</strong>terest <strong>in</strong> what follows is on the comb<strong>in</strong>ed methods. We refer to<br />
the survey papers [Vib95, RA92, VDS93] <strong>and</strong> the book [VD96] for a more<br />
comprehensive presentation of (the history of) the subspace identification<br />
methods.<br />
In the follow<strong>in</strong>g, a brief <strong>in</strong>troduction to the basic ideas that underlie<br />
the subspace identification methods is given. A l<strong>in</strong>ear time-<strong>in</strong>variant<br />
discrete-time system of order n can be described by the state-space equations<br />
x(t + 1) = Ax(t) + Bu(t) + w(t), (1.22a)<br />
y(t) = Cx(t) + Du(t) + v(t). (1.22b)<br />
Here, u(t) ∈ R m <strong>and</strong> y(t) ∈ R l represent the observed <strong>in</strong>put <strong>and</strong> output<br />
data, respectively. The state-vector x(t) ∈ R n is an <strong>in</strong>ternal variable.<br />
The state equation is perturbed by the process noise w(t) ∈ R n , <strong>and</strong> the
20 1 Introduction<br />
output equation conta<strong>in</strong>s the measurement noise v(t) ∈ Rl . Both w(t)<br />
<strong>and</strong> v(t) are white noise processes. We assume that the description is<br />
m<strong>in</strong>imal. That is, there is no other state-space description with lower<br />
state-dimension than n. Observe that, if the state-sequence {x(t)} were<br />
known, then it would be straightforward to estimate the state-space matrices<br />
{A,B,C,D} <strong>in</strong> (1.22) by l<strong>in</strong>ear regression. The question is how to<br />
estimate the state from the measured data {u(t),y(t)}.<br />
Before present<strong>in</strong>g the data-model that is used for state estimation, we<br />
need to def<strong>in</strong>e the follow<strong>in</strong>g block matrices:<br />
⎡<br />
⎢<br />
Γα = ⎢<br />
⎣<br />
C<br />
CA<br />
.<br />
CAα−1 ⎤<br />
⎥<br />
⎦ ,<br />
⎡<br />
D 0 · · · 0<br />
⎢<br />
Φα = ⎢<br />
⎣<br />
CB<br />
.<br />
D<br />
. ..<br />
. ..<br />
. ..<br />
0<br />
.<br />
CAα−2B CAα−3 ⎤<br />
⎥<br />
⎦<br />
B · · · D<br />
,<br />
⎡<br />
0 · · · 0 0<br />
⎢<br />
Ψα = ⎢<br />
⎣<br />
C<br />
.<br />
. ..<br />
. ..<br />
0 0<br />
.<br />
.<br />
.<br />
.<br />
CAα−2 ⎤<br />
⎥<br />
⎦<br />
· · · C 0<br />
.<br />
Notice that Γα is the extended observability matrix if α > n <strong>and</strong> it is full<br />
column rank s<strong>in</strong>ce the system is m<strong>in</strong>imal. Introduce the notation<br />
yα(t) = � y T (t) y T (t + 1) ... y T (t + α − 1) � T , (1.23)<br />
which is of dimension αl. Similarly, def<strong>in</strong>e the vectors uα(t), wα(t), <strong>and</strong><br />
vα(t) made from <strong>in</strong>puts, process <strong>and</strong> measurement noises, respectively.<br />
We are now ready to give the basic data model that is used <strong>in</strong> the<br />
subspace identification methods. From the state-space equations (1.22),<br />
it is easily seen that<br />
where<br />
yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (1.24)<br />
nα(t) = Ψαwα(t) + vα(t).
1.2 <strong>System</strong> <strong>Identification</strong> 21<br />
Observe that the vectors yα(t), uα(t), <strong>and</strong> nα(t) conta<strong>in</strong> data from time<br />
t <strong>and</strong> onwards. The state x(t) conta<strong>in</strong>s the <strong>in</strong>formation about the past<br />
necessary to predict the future behavior of the system. In fact,<br />
ˆyα(t) = Γαx(t)<br />
is the optimal (<strong>in</strong> the mean-square sense) prediction of yα(t) given data<br />
up to time t. Assume for the moment that ˆyα(t) is known for t =<br />
1,2,...,N. Def<strong>in</strong>e the matrices<br />
Clearly,<br />
ˆYα = � ˆyα(1) ˆyα(2) ... ˆyα(N) � , (1.25)<br />
X = � x(1) x(2) ... x(N) � . (1.26)<br />
ˆYα = ΓαX (1.27)<br />
<strong>and</strong> it is of dimension αl × N. If N > n <strong>and</strong> αl > n, then 4<br />
rank{ ˆ Yα} = n (1.28)<br />
<strong>and</strong>, hence, ˆ Yα is low-rank. The factorization <strong>in</strong> (1.27) is not unique<br />
s<strong>in</strong>ce ΓαTT −1 X = ΓαX for any non-s<strong>in</strong>gular matrix T. This nonuniqueness<br />
corresponds to a change of state-space basis (x(t) → T −1 x(t))<br />
<strong>and</strong> changes only the <strong>in</strong>ternal structure of the system description. The<br />
<strong>in</strong>put-output relation does not change. This implies that if ˆ Yα is known,<br />
then Γα <strong>and</strong> X (<strong>in</strong> some state-space basis) can be obta<strong>in</strong>ed by a low-rank<br />
factorization of ˆ Yα.<br />
It rema<strong>in</strong>s to show how to estimate ˆ Yα, <strong>and</strong> how to factorize that<br />
estimate. Recall the relation (1.24) <strong>and</strong> observe that the state-vector x(t)<br />
conta<strong>in</strong>s the past <strong>in</strong>formation. In other words, x(t) can be written as a<br />
l<strong>in</strong>ear comb<strong>in</strong>ation of the past data {u(t−k),y(t−k)} for k = 1,2,... , ∞.<br />
From (1.24), we can write<br />
yα(t) = Γαx(t) + Φαuα(t) + nα(t)<br />
≈ Γα (K1yβ(t − β) + K2uβ(t − β)) + Φαuα(t) + nα(t),<br />
� L1yβ(t − β) + L2uβ(t − β) + Φαuα(t) + nα(t),<br />
(1.29)<br />
4 Here, we assume that X is full row-rank (= n), which is generically true for large<br />
N if the <strong>in</strong>puts are sufficiently excit<strong>in</strong>g. This is actually at the basis of the realization<br />
theory presented <strong>in</strong> [Aka74a, Aka74b, HK66].
22 1 Introduction<br />
where the notation uβ(t − β) <strong>and</strong> yβ(t − β) is analogous to (1.23):<br />
uβ(t − β) = � u T (t − β) u T (t − β + 1) ... u T (t − 1) � T ,<br />
yβ(t − β) = � y T (t − β) y T (t − β + 1) ... y T (t − 1) � T .<br />
In (1.29), L1 = ΓαK1 <strong>and</strong> L2 = ΓαK2 are two unknown matrices. The<br />
approximation <strong>in</strong> (1.29) is due to the fact that the amount of past <strong>in</strong>putoutput<br />
data is truncated. <strong>On</strong>ly β past samples are used to estimate x(t). 5<br />
S<strong>in</strong>ce Γα is unknown, it is actually Γαx(t) that is estimated from the past<br />
data <strong>in</strong> (1.29). The idea is to use (1.29) to estimate ˆyα(t) as<br />
ˆyα(t) ≈ L1yβ(t − β) + L2uβ(t − β).<br />
If Φα was known, then L1 <strong>and</strong> L2 could be estimated by l<strong>in</strong>ear regression.<br />
However, s<strong>in</strong>ce Φα is unknown, it is estimated as well. Let ˆ L1, ˆ L2 <strong>and</strong><br />
ˆΦα denote the estimates of the respective quantity obta<strong>in</strong>ed by l<strong>in</strong>ear<br />
regression <strong>in</strong> (1.29). An estimate of the prediction matrix ˆ Yα <strong>in</strong> (1.25)<br />
is then<br />
ˆYα = ˆ �<br />
L1 yβ(1) ... yβ(N) � + ˆ �<br />
L2 uβ(1) ... uβ(N) � .<br />
For f<strong>in</strong>ite data (f<strong>in</strong>ite N), ˆ Yα typically has full rank. S<strong>in</strong>ce the rank<br />
of ˆ Yα is equal to n, it is sensible to somehow reduce the rank of ˆ Yα.<br />
The s<strong>in</strong>gular value decomposition (SVD) is a powerful tool <strong>in</strong> numerical<br />
l<strong>in</strong>ear algebra. <strong>On</strong>e of its strong properties is that it provides the best<br />
(<strong>in</strong> spectral or Frobenius norm) low-rank approximation of a full rank<br />
matrix [GV96]. Let<br />
ˆYα = ˆ Qˆ S ˆ V T = � �<br />
Qs<br />
ˆ ˆQn<br />
� � �<br />
Ss<br />
ˆ 0 ˆV T<br />
s<br />
0 Sn<br />
ˆ ˆV T �<br />
(1.30)<br />
n<br />
be the SVD of ˆ Yα. In (1.30), ˆ Q <strong>and</strong> ˆ V are orthonormal matrices <strong>and</strong> ˆ S is<br />
a diagonal matrix hav<strong>in</strong>g the s<strong>in</strong>gular values arranged <strong>in</strong> non-<strong>in</strong>creas<strong>in</strong>g<br />
order along the diagonal. The partition<strong>in</strong>g <strong>in</strong> (1.30) is made so that ˆ Ss<br />
is n × n <strong>and</strong> ˆ Sn conta<strong>in</strong>s the rema<strong>in</strong><strong>in</strong>g (small) s<strong>in</strong>gular values. Notice<br />
5 For simplicity, we have taken the same number, β, of past <strong>in</strong>puts <strong>and</strong> outputs. Of<br />
course, it is possible to take different numbers for the <strong>in</strong>put <strong>and</strong> the output, respectively.
1.2 <strong>System</strong> <strong>Identification</strong> 23<br />
that ˆ Sn = 0 <strong>in</strong> the ideal case (N → ∞). Reta<strong>in</strong><strong>in</strong>g the part of the SVD<br />
correspond<strong>in</strong>g to the n largest s<strong>in</strong>gular values gives<br />
ˆYα ≈ ˆ Qs ˆ Ss ˆ V T s , (1.31)<br />
which is the desired rank n approximation of ˆ Yα. In view of (1.27) <strong>and</strong><br />
(1.31), we obta<strong>in</strong> the follow<strong>in</strong>g estimates of the extended observability<br />
matrix <strong>and</strong> the state-sequence:<br />
ˆΓα = ˆ Qs ˆ S 1/2<br />
s T,<br />
ˆX = T −1ˆ1/2 S ˆV s<br />
T s .<br />
Here, ˆ S 1/2<br />
s denotes the square root of ˆ Ss, <strong>and</strong> T is an arbitrary nons<strong>in</strong>gular<br />
n × n matrix. Recall that T merely determ<strong>in</strong>es the state-space<br />
basis of the realization <strong>and</strong> does not affect the <strong>in</strong>put-output relation. We<br />
may, for example, choose T = I. Thus, by the manipulations above, we<br />
have an estimate of X which can be used to estimate the system matrices<br />
by l<strong>in</strong>ear regression <strong>in</strong> the state-space equations (1.22) as mentioned before.<br />
Furthermore, the sample covariance of the residuals <strong>in</strong> the regression<br />
gives an estimate of the covariance of the noise process � w T (t) v T (t) � T .<br />
The procedure presented above summarizes the general ideas that,<br />
more or less explicitly, are used <strong>in</strong> the subspace system identification<br />
methods. Clearly, the crucial step is how to estimate the states. The<br />
approximations made above are the truncation of past data (β), <strong>and</strong> the<br />
estimation of ˆ Yα by l<strong>in</strong>ear regression. It can be shown [PSD96] that<br />
the state estimate derived above is consistent if β → ∞. The exist<strong>in</strong>g<br />
subspace methods conta<strong>in</strong> several modifications of the above procedure.<br />
<strong>On</strong>e of the reasons why these modifications are made is that it is possible<br />
to get consistent estimates of the system matrices for a f<strong>in</strong>ite β.<br />
Conceptually, the idea of first estimat<strong>in</strong>g the state <strong>and</strong> then the system<br />
matrices by l<strong>in</strong>ear regression is very appeal<strong>in</strong>g. However, as mentioned<br />
above, it is complicated to obta<strong>in</strong> a consistent estimate of the<br />
state. A common approach to estimate A <strong>and</strong> C is <strong>in</strong>stead to use the<br />
shift-<strong>in</strong>variance structure <strong>in</strong> Γα as <strong>in</strong> the classical realization algorithm<br />
by Ho <strong>and</strong> Kalman [HK66] (see Section 5.5). Recall that an estimate<br />
of Γα is obta<strong>in</strong>ed from the factorization of ˆ Yα <strong>in</strong> (1.30). An advantage<br />
with methods based on ˆ Γα is that it is much easier to obta<strong>in</strong> a consistent<br />
estimate of Γα than it is to estimate the states (see Chapter 6). Given<br />
estimates of A <strong>and</strong> C, the B <strong>and</strong> D matrices can be estimated by l<strong>in</strong>ear
24 1 Introduction<br />
regression s<strong>in</strong>ce the transfer function from u(t) to y(t) is l<strong>in</strong>ear <strong>in</strong> B <strong>and</strong><br />
D. (There are also other possibilities [VD96, Ver94].)<br />
Another approach is presented by Van Overschee <strong>and</strong> De Moor<br />
[VD94b]. In [VD94b], it is shown that the state estimate obta<strong>in</strong>ed for<br />
fixed β can be <strong>in</strong>terpreted as be<strong>in</strong>g the output of a bank of non-steady<br />
state Kalman filters. This observation is used to rewrite the state-space<br />
equations <strong>in</strong>to a slightly modified l<strong>in</strong>ear regression from which the system<br />
matrices {A,B,C,D} can be estimated by least-squares.<br />
1.2.3 Discussion<br />
Many of the subspace system identification methods have been derived<br />
<strong>in</strong> an algebraic framework <strong>in</strong>volv<strong>in</strong>g orthogonal <strong>and</strong> oblique projections<br />
on the data. Traditionally, such operations are <strong>in</strong> the system identification<br />
literature rather described <strong>in</strong> terms of cross-correlations. The<br />
subspace methods also make extensive use of tools from numerical l<strong>in</strong>ear<br />
algebra, which is beneficial at the implementation level, but it complicates<br />
the underst<strong>and</strong><strong>in</strong>g of the underly<strong>in</strong>g ideas. Among other facts, this<br />
has somewhat obscured the relation between classical methods (like PEM<br />
<strong>and</strong> <strong>in</strong>strumental variable methods) <strong>and</strong> the 4SID methods. It is still a<br />
topic of research to br<strong>in</strong>g these two approaches closer. <strong>On</strong>e of the aims<br />
of Chapter 5 is to po<strong>in</strong>t to similarities <strong>and</strong> differences by formulat<strong>in</strong>g<br />
the subspace estimation problem <strong>in</strong> a l<strong>in</strong>ear regression framework. (By<br />
subspace estimation we mean the estimation of the extended observability<br />
matrix Γα.) This formulation is similar to the one <strong>in</strong> the previous<br />
section <strong>and</strong> gives some further <strong>in</strong>sights <strong>in</strong>to the relation between the different<br />
subspace estimates that have been proposed <strong>in</strong> the literature. The<br />
l<strong>in</strong>ear regression formulation unifies the subspace estimation step of most<br />
exist<strong>in</strong>g methods. Weight<strong>in</strong>g matrices that appear rather ad hoc <strong>in</strong> many<br />
previous derivations fall naturally <strong>in</strong>to the formulation of the subspace<br />
estimation <strong>in</strong> terms of a l<strong>in</strong>ear regression cost function m<strong>in</strong>imization. As<br />
seen above, there are a number of truncation parameters that are to be<br />
chosen by the user (α <strong>and</strong> β). Some effects due to different choices of<br />
these parameters are <strong>in</strong> Chapter 5 studied by means of asymptotic variance<br />
expressions for the pole estimates. It can be shown that most of<br />
the common choices of subspace estimates lead to the same asymptotic<br />
variance of the poles when the observed <strong>in</strong>put is white noise. For colored<br />
<strong>in</strong>puts, there are differences though. This is illustrated <strong>in</strong> a numerical<br />
example.<br />
The subspace identification methods rely on the fact that a certa<strong>in</strong>
1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions 25<br />
cross-correlation matrix ( ˆ Yα) is low-rank. The rank should be equal to<br />
the system order to guarantee identifiability. Given the “true” system<br />
<strong>and</strong> the signal properties, it is of course possible to check this condition.<br />
If only the system order n is known, it is also possible to test if the data<br />
quantity is “sufficiently” close to a rank n matrix by, for example, a hypothesis<br />
test. If the rank is to small, then this <strong>in</strong>dicates that the data<br />
are not <strong>in</strong>formative enough, <strong>and</strong> the experiment should be re-designed.<br />
In Chapter 6, excitation conditions on the data to ensure consistency<br />
are discussed. For more or less restrictive assumptions on the system,<br />
explicit persistence of excitation conditions on the observed <strong>in</strong>put signal<br />
are given. An <strong>in</strong>terest<strong>in</strong>g observation made <strong>in</strong> Chapter 6 is the fact<br />
that a persistence of excitation condition on the <strong>in</strong>put does not <strong>in</strong> general<br />
guarantee consistency. Indeed, there exists examples for which the<br />
aforementioned rank condition does not hold. This means that, <strong>in</strong> such<br />
cases, the subspace methods under consideration do not give a consistent<br />
estimate of the true system. <strong>On</strong>e should, however, remember that such<br />
counterexamples are rather “pathological” <strong>and</strong> generically the rank condition<br />
holds for the persistence of excitation conditions given <strong>in</strong> the thesis.<br />
The importance of this result maybe is more relevant from a performance<br />
po<strong>in</strong>t of view. It is expected that the accuracy of the estimated model is<br />
poor if the identification conditions are close to a “counterexample”.<br />
1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions<br />
This section gives a chapter by chapter description of the contributions<br />
made <strong>in</strong> the thesis. This also serves as a brief overview of what will<br />
follow.<br />
Chapter 2<br />
The chapter studies the application of the MODE (method of direction estimation)<br />
or the weighted subspace fitt<strong>in</strong>g (WSF) pr<strong>in</strong>ciple to the pr<strong>in</strong>cipal<br />
(signal) eigenvectors of the forward-backward (FB) sample covariance<br />
matrix. The MODE pr<strong>in</strong>ciple consists of match<strong>in</strong>g the signal eigenvectors<br />
to the array model <strong>in</strong> a statistically sound way. Somewhat surpris<strong>in</strong>gly,<br />
it turns out that the best one can do is to apply the st<strong>and</strong>ard MODE<br />
estimator to the eigenelements of the FB-covariance. It is shown that the<br />
accuracy of the so-obta<strong>in</strong>ed estimates <strong>in</strong> general is <strong>in</strong>ferior to the st<strong>and</strong>ard<br />
MODE estimate. The FB approach has shown to be beneficial for
26 1 Introduction<br />
other (suboptimal) array process<strong>in</strong>g algorithms such as MUSIC <strong>and</strong> ES-<br />
PRIT. However, <strong>in</strong> view of the abovementioned result, forward-backward<br />
averag<strong>in</strong>g is not recommended to be used <strong>in</strong> conjunction with MODE or<br />
WSF.<br />
An article version of this chapter has been accepted for publication<br />
as<br />
Petre Stoica <strong>and</strong> Magnus Jansson, “<strong>On</strong> Forward-Backward MODE<br />
for Array Signal Process<strong>in</strong>g”, Digital Signal Process<strong>in</strong>g – A Review<br />
Journal, Aug. 1997.<br />
This work is also related to the material presented <strong>in</strong><br />
Magnus Jansson <strong>and</strong> Björn Ottersten, “Covariance Preprocess<strong>in</strong>g<br />
<strong>in</strong> Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”, In Proc. IEEE/IEE workshop on<br />
signal process<strong>in</strong>g methods <strong>in</strong> multipath environments, pp. 23-32,<br />
Glasgow, Scotl<strong>and</strong>, Apr. 1995.<br />
That paper analyzes the performance of MODE <strong>and</strong> WSF when the sample<br />
covariance matrix is l<strong>in</strong>early preprocessed before the estimator is applied.<br />
The forward-backward preprocess<strong>in</strong>g is a special case for which<br />
that analysis holds.<br />
Chapter 3<br />
In many cases it is not the errors due to noise that is the limit<strong>in</strong>g factor<br />
when high resolution direction estimators is applied <strong>in</strong> real situations.<br />
The performance limitations may rather be due to errors <strong>in</strong> the array<br />
model. The problem of obta<strong>in</strong><strong>in</strong>g a robust direction of arrival estimator<br />
is considered <strong>in</strong> this chapter. The robustness refers to <strong>in</strong>sensitivity to errors<br />
<strong>in</strong> the array model. A signal subspace fitt<strong>in</strong>g approach is taken <strong>and</strong><br />
a weight<strong>in</strong>g that accounts for both the f<strong>in</strong>ite sample effects of the noise<br />
<strong>and</strong> the model errors is derived. It is shown that the weight<strong>in</strong>g can be<br />
written as the sum of two terms. The first, is noth<strong>in</strong>g but the st<strong>and</strong>ard<br />
WSF weight<strong>in</strong>g that is optimal <strong>in</strong> the case with no model errors. The second<br />
term compensates for the first order effects of the errors <strong>in</strong> the array<br />
model. The asymptotic performance is shown to co<strong>in</strong>cide with the CRB<br />
for the problem under study. There exists other methods (MAPprox <strong>and</strong><br />
MAP-NSF) that have the same asymptotic optimality properties, but the<br />
proposed method (GWSF) has some important advantages. Compared<br />
to MAP-NSF, GWSF has a much better small sample performance when<br />
the sources are closely spaced <strong>and</strong>/or when they are highly correlated.
1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions 27<br />
The GWSF formulation also allows for a relatively simple <strong>and</strong> efficient<br />
implementation compared to both MAPprox <strong>and</strong> MAP-NSF. The chapter<br />
also conta<strong>in</strong>s a performance analysis of MAPprox which establishes<br />
the asymptotic optimality of the method that has been observed <strong>in</strong> simulations.<br />
The results presented <strong>in</strong> Chapter 3 have been submitted for possible<br />
publication <strong>in</strong> the follow<strong>in</strong>g form:<br />
Magnus Jansson, A. Lee Sw<strong>in</strong>dlehurst <strong>and</strong> Björn Ottersten,<br />
“Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models”, Submitted<br />
to IEEE Trans. on Signal Process<strong>in</strong>g, Jul. 1997.<br />
Chapter 4<br />
In many applications it is reasonable to assume that the signals that<br />
imp<strong>in</strong>ge on the array are <strong>in</strong>dependent of one another. In general, array<br />
process<strong>in</strong>g methods do not presuppose any correlation among the signals.<br />
In case it is a priori known that the signals are uncorrelated it<br />
seems judicious to use this <strong>in</strong>formation to improve the performance of<br />
the estimator. A method which makes use of the a priori <strong>in</strong>formation <strong>in</strong><br />
an optimal way is presented <strong>in</strong> Chapter 4. That is, it provides m<strong>in</strong>imum<br />
variance direction estimates. Natural alternative approaches are the use<br />
of the maximum likelihood or the covariance match<strong>in</strong>g method to impose<br />
the additional structure <strong>in</strong> the estimation problem. However, the proposed<br />
method appears to have computational advantages compared to<br />
the two aforementioned methods.<br />
A version of Chapter 4 has been submitted as<br />
Magnus Jansson, Bo Göransson <strong>and</strong> Björn Ottersten, “A <strong>Subspace</strong><br />
Method for Direction of Arrival Estimation of Uncorrelated Emitter<br />
Signals”, Submitted to IEEE Trans. on Signal Process<strong>in</strong>g, Aug.<br />
1997.<br />
Some of the results given <strong>in</strong> Chapter 4 have also appeared <strong>in</strong><br />
Bo Göransson, Magnus Jansson <strong>and</strong> Björn Ottersten, “Spatial <strong>and</strong><br />
Temporal Frequency Estimation of Uncorrelated Signals Us<strong>in</strong>g <strong>Subspace</strong><br />
Fitt<strong>in</strong>g”, In Proceed<strong>in</strong>gs of 8th IEEE Signal Process<strong>in</strong>g Workshop<br />
on Statistical Signal <strong>and</strong> Array Process<strong>in</strong>g, pp. 94-96, Corfu,<br />
Greece, June 1996.
28 1 Introduction<br />
Magnus Jansson, Bo Göransson <strong>and</strong> Björn Ottersten, “Analysis of<br />
a <strong>Subspace</strong>-Based Spatial Frequency Estimator”, In Proceed<strong>in</strong>gs of<br />
ICASSP’97, pp. 4001-4004, Munich, Germany, Apr. 1997.<br />
Chapter 5<br />
The chapter conta<strong>in</strong>s the l<strong>in</strong>ear regression <strong>in</strong>terpretation of the subspace<br />
system identification methods. The l<strong>in</strong>ear regression formulation is used<br />
to <strong>in</strong>terpret exist<strong>in</strong>g estimates of the observability matrix as be<strong>in</strong>g the<br />
m<strong>in</strong>imiz<strong>in</strong>g arguments of different cost functions. It is shown that the accuracy<br />
of the estimates strongly depends on the choice of a user specified<br />
parameter. An important observation is that the choice of this parameter<br />
is system dependent, which makes the task non-trivial. It is also<br />
shown that, asymptotically <strong>in</strong> the number of samples, the pole estimates<br />
are <strong>in</strong>variant to a specific weight<strong>in</strong>g matrix employed <strong>in</strong> the identification<br />
schemes. Numerical examples are given to further illustrate the effect of<br />
different choices of weight<strong>in</strong>g matrices.<br />
With m<strong>in</strong>or changes, this chapter conta<strong>in</strong>s the article,<br />
Magnus Jansson <strong>and</strong> Bo Wahlberg. “A L<strong>in</strong>ear Regression Approach<br />
to State-Space <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”, Signal Process<strong>in</strong>g,<br />
Special Issue on <strong>Subspace</strong> <strong>Methods</strong>, Part II: <strong>System</strong> <strong>Identification</strong>,<br />
Vol. 52, No. 2, pp. 103-129, July 1996.<br />
Parts of this material have also appeared as<br />
Bo Wahlberg <strong>and</strong> Magnus Jansson, “4SID L<strong>in</strong>ear Regression”, In<br />
Proc. IEEE 33rd Conf. on Decision <strong>and</strong> Control, pp. 2858-2863.<br />
Orl<strong>and</strong>o, USA, Dec. 1994.<br />
Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Weight<strong>in</strong>g <strong>in</strong> State-Space<br />
<strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”, In Proc. European Control Conference,<br />
ECC’95, pp. 435-440, Rome, Italy, Sep. 1995.<br />
Magnus Jansson, “<strong>On</strong> Performance Analysis of <strong>Subspace</strong> <strong>Methods</strong><br />
<strong>in</strong> <strong>System</strong> <strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”, Licentiate<br />
Thesis (TRITA-REG-9503, ISSN 0347-1071), Automatic Control,<br />
Dept. of Signals, <strong>Sensor</strong>s <strong>and</strong> <strong>System</strong>s, <strong>KTH</strong>, Stockholm, June<br />
1995.
1.4 Topics for Future Research 29<br />
Chapter 6<br />
The consistency of the subspace estimate (or, equivalently, the extended<br />
observability matrix) is studied. Persistence of excitation conditions on<br />
the <strong>in</strong>put signal are given, which proves consistency for systems where<br />
the output is perturbed by white measurement noise. If the system <strong>in</strong>cludes<br />
process noise, then it is shown that the subspace methods under<br />
consideration may fail to give a consistent estimate even though the <strong>in</strong>put<br />
is persistently excit<strong>in</strong>g to any order. An explicit example is given when<br />
such a failure occurs. It is also shown that this problem is elim<strong>in</strong>ated if<br />
the <strong>in</strong>put is a white noise process (or a low order ARMA process.)<br />
A version of Chapter 6 has been submitted as<br />
Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Consistency of <strong>Subspace</strong><br />
<strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong>”, Submitted to Automatica, Jun.<br />
1997.<br />
Parts of the material <strong>in</strong> Chapter 6 have been presented at the follow<strong>in</strong>g<br />
conferences:<br />
Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Consistency of <strong>Subspace</strong><br />
<strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”, In Prepr<strong>in</strong>ts of the 13th IFAC<br />
World Congress, pp. 181-186, vol. I, San Francisco, California,<br />
USA, Jul. 1996.<br />
Magnus Jansson <strong>and</strong> Bo Wahlberg, “Counterexample to General<br />
Consistency of <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”, In Proc.<br />
11th IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pp. 1677-1682,<br />
Fukuoka, Japan, Jul. 1997.<br />
1.4 Topics for Future Research<br />
Forward-backward (FB) averag<strong>in</strong>g is a topic that has received quite large<br />
attention <strong>in</strong> the array process<strong>in</strong>g literature. Despite this, it seems not<br />
really known when FB averag<strong>in</strong>g should or should not be used. For<br />
example, should FB averag<strong>in</strong>g always be used when employ<strong>in</strong>g the MU-<br />
SIC or ESPRIT estimators? It has been shown that, <strong>in</strong> the case of<br />
uncorrelated emitter signals, the asymptotic estimation error variance is<br />
not improved, but the resolution threshold is decreased [KP89]. There<br />
are also some results for the case of correlated emitter signals, but they<br />
seem to be less explicit [RH93, PK89]. If the true covariance matrix is
30 1 Introduction<br />
centro-Hermitian, then it is relatively straightforward to show that the<br />
FB sample covariance <strong>in</strong> (2.16) is the correspond<strong>in</strong>g (structured) maximum<br />
likelihood (ML) estimate [Qua84]. Recall that the st<strong>and</strong>ard sample<br />
covariance is the unstructured ML estimate. It is then reasonable to expect<br />
that the FB sample estimate is a better estimate than the st<strong>and</strong>ard<br />
sample covariance with lower estimation error variance. This does not<br />
directly follow from the ML properties, but the author has proved this explicitly<br />
<strong>and</strong> a report is <strong>in</strong> progress. In view of this, the result of Chapter 2<br />
becomes even more surpris<strong>in</strong>g. Although the FB covariance estimate is<br />
better, this does not necessarily lead to better direction estimates, as<br />
proved <strong>in</strong> Chapter 2 (see also [KP89]).<br />
Similar to the development <strong>in</strong> [VS94b], one can extend the derivation<br />
of GWSF <strong>in</strong> Chapter 3 to also <strong>in</strong>clude particular errors <strong>in</strong> the noise<br />
model. In [VS94b], a statistical model that allows for the compensation<br />
of first order errors <strong>in</strong> the noise covariance is given. Us<strong>in</strong>g this model,<br />
an additional term <strong>in</strong> the weight<strong>in</strong>g matrix for GWSF would result. It<br />
would also be <strong>in</strong>terest<strong>in</strong>g to evaluate GWSF on real array data.<br />
The method (DEUCE) for direction estimation of uncorrelated emitter<br />
signals that is derived <strong>in</strong> Chapter 4 admittedly is quite cumbersome<br />
to implement. It would be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate if there exists a<br />
simpler estimator preserv<strong>in</strong>g the performance of DEUCE. Some prelim<strong>in</strong>ary<br />
results <strong>in</strong>dicate that this really is conceivable. Indeed, consider<br />
the case of uncorrelated emitter signals imp<strong>in</strong>g<strong>in</strong>g on a uniform l<strong>in</strong>ear<br />
array. The covariance of the array output vector is then a Toeplitz matrix.<br />
This also implies that the matrix employed <strong>in</strong> DEUCE (see (4.3))<br />
ideally is Toeplitz. The idea is to first perform a simple averag<strong>in</strong>g along<br />
the diagonals of the sample estimate of the matrix <strong>in</strong> (4.3) to obta<strong>in</strong> a<br />
sufficient statistic of lower dimension. Compared to DEUCE, this will<br />
lead to an estimator which is computationally less complicated. The results<br />
of Chapter 4 may also f<strong>in</strong>d applications <strong>in</strong> “structured covariance<br />
estimation”. S<strong>in</strong>ce DEUCE provides asymptotically maximum likelihood<br />
estimates of the parameters that parameterizes the Toeplitz covariance<br />
matrix, a structured covariance estimate can be constructed. More detailed<br />
results <strong>in</strong> this direction are expected <strong>in</strong> the near future. The ideas<br />
<strong>in</strong> Chapter 4 may also be useful to apply <strong>in</strong> other similar estimation<br />
problems such as, for example, bl<strong>in</strong>d channel identification.<br />
The analysis of the state-space subspace system identification (4SID)<br />
methods <strong>in</strong> the thesis concerns the consistency of the methods <strong>and</strong> the<br />
asymptotic variance of the pole estimates. So far we have not been able<br />
to extend the analysis to obta<strong>in</strong> asymptotic variance expressions for the
1.4 Topics for Future Research 31<br />
estimated transfer function. We have only derived the asymptotic variance<br />
for the estimates of the A <strong>and</strong> C matrices obta<strong>in</strong>ed by the shift<br />
<strong>in</strong>variance method. Given estimates of A <strong>and</strong> C, the B <strong>and</strong> D matrices<br />
are often estimated by l<strong>in</strong>ear regression <strong>in</strong> the <strong>in</strong>put output relation. The<br />
estimates of A <strong>and</strong> C appear <strong>in</strong> the regression equations for estimat<strong>in</strong>g<br />
B <strong>and</strong> D, which makes it very complicated to compute the asymptotic<br />
variance for the estimates. Deriv<strong>in</strong>g asymptotic variance expressions for<br />
the estimated transfer function is an important topic for future research.<br />
Such variance expressions can be used to show the relation between the<br />
subspace methods <strong>and</strong> the prediction error methods. They can also be<br />
used to estimate the quality of the identified model. There exist several<br />
alternatives for estimat<strong>in</strong>g the system matrices. A topic for research is<br />
to evaluate their relative performance. Another question that can be addressed<br />
is how the future <strong>in</strong>puts really should be treated <strong>in</strong> the subspace<br />
estimation step. In the elaborations <strong>in</strong> this chapter <strong>and</strong> <strong>in</strong> Chapter 5,<br />
the matrix Φα is estimated by l<strong>in</strong>ear regression without impos<strong>in</strong>g any<br />
structure on the estimate. Recall that Φα is a lower triangular Toeplitz<br />
matrix. The simulation results <strong>in</strong> [PSD96] suggest that the performance<br />
can be improved by us<strong>in</strong>g a structured estimate of Φα. As po<strong>in</strong>ted out<br />
<strong>in</strong> the thesis, the subspace methods for system identification are quite<br />
sensitive to the <strong>in</strong>put excitation. It would be <strong>in</strong>terest<strong>in</strong>g to f<strong>in</strong>d ways to<br />
decrease this sensitivity. Us<strong>in</strong>g more of the structure <strong>in</strong> the problem is<br />
likely the best approach to achieve this goal.
Part I<br />
<strong>Sensor</strong> Array Signal<br />
Process<strong>in</strong>g
Chapter 2<br />
<strong>On</strong> Forward-Backward<br />
MODE for Array Signal<br />
Process<strong>in</strong>g<br />
In this chapter we apply the MODE (method of direction estimation)<br />
pr<strong>in</strong>ciple to the forward-backward (FB) covariance of the output vector<br />
of a sensor array to obta<strong>in</strong> what we call the FB-MODE procedure. The<br />
derivation of FB-MODE is an <strong>in</strong>terest<strong>in</strong>g exercise <strong>in</strong> matrix analysis, the<br />
outcome of which was somewhat unexpected: FB-MODE simply consists<br />
of apply<strong>in</strong>g the st<strong>and</strong>ard MODE approach to the eigenelements of<br />
the FB sample covariance matrix. By us<strong>in</strong>g an asymptotic expansion<br />
technique we also establish the surpris<strong>in</strong>g result that FB-MODE is outperformed,<br />
from a statistical st<strong>and</strong>po<strong>in</strong>t, by the st<strong>and</strong>ard MODE applied<br />
to the forward-only sample covariance (F-MODE). We believe this to be<br />
an important result that shows that the FB approach, which proved quite<br />
useful for improv<strong>in</strong>g the performance of many suboptimal array process<strong>in</strong>g<br />
methods, should not be used with a statistically optimal method such<br />
as F-MODE.<br />
2.1 Introduction<br />
MODE is a method of direction estimation by means of a sensor array,<br />
which is known to be statistically efficient <strong>in</strong> cases when either the num-
36 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
ber of data samples (N) or the signal-to-noise ratio (SNR) is sufficiently<br />
large [VO91, SS90a, SN90]. For uniform <strong>and</strong> l<strong>in</strong>ear arrays (ULAs) the<br />
implementation of MODE requires only st<strong>and</strong>ard matrix operations (such<br />
as an eigendecomposition) <strong>and</strong> is thus very straightforward. In fact <strong>in</strong><br />
the ULA case MODE is a clear c<strong>and</strong>idate for the best possible (i.e., computationally<br />
simple <strong>and</strong> statistically efficient) array process<strong>in</strong>g method<br />
[KV96].<br />
We stress that the aforementioned statistical efficiency of MODE<br />
holds asymptotically (<strong>in</strong> N or SNR). In short-sample or low-SNR cases<br />
other methods may provide better performance. The forward-backward<br />
(FB) approach is a well-known methodology that has been successfully<br />
used to enhance the performance of a number of array signal process<strong>in</strong>g<br />
algorithms [RH93, KP89, Pil89]. Recently this approach has been used<br />
<strong>in</strong> conjunction with MODE probably <strong>in</strong> an attempt to obta<strong>in</strong> enhanced<br />
(f<strong>in</strong>ite sample or SNR) performance [Haa97b, Haa97a]. More exactly <strong>in</strong><br />
the cited publications the st<strong>and</strong>ard MODE approach was applied to the<br />
eigenelements of the FB sample covariance matrix.<br />
The first problem we deal with <strong>in</strong> this chapter concerns the formal<br />
application of the MODE pr<strong>in</strong>ciple to the FB sample covariance (see also<br />
[JO95]). After an exercise <strong>in</strong> matrix analysis we prove the somewhat<br />
unexpected result that FB-MODE should <strong>in</strong>deed consist of apply<strong>in</strong>g the<br />
st<strong>and</strong>ard MODE to the eigenelements of the FB sample covariance matrix.<br />
Then we go on to establish the asymptotic statistical performance<br />
of FB-MODE. Because the st<strong>and</strong>ard F-MODE is asymptotically statistically<br />
efficient (<strong>in</strong> the sense that it achieves the Cramér-Rao bound (CRB))<br />
[VO91, SS90a, SN90], we cannot expect FB-MODE to perform better <strong>in</strong><br />
asymptotic regimes. What one might expect is that FB-MODE outperforms<br />
F-MODE <strong>in</strong> non-asymptotic regimes <strong>and</strong> that the two methods<br />
have the same asymptotic performance. We show that the conjecture on<br />
identical asymptotic performances, even though quite natural, is false:<br />
FB-MODE’s asymptotic performance is usually <strong>in</strong>ferior to that of F-<br />
MODE. Regard<strong>in</strong>g the comparison of the two methods <strong>in</strong> non-asymptotic<br />
regimes, because the f<strong>in</strong>ite-sample/SNR analysis is <strong>in</strong>tractable, we resort<br />
to Monte-Carlo simulations to show that F-MODE typically outperforms<br />
FB-MODE also <strong>in</strong> such cases.<br />
In conclusion, the FB approach – which has been successfully employed<br />
to enhance the performance of several array signal process<strong>in</strong>g algorithms<br />
– cannot be recommended for use with MODE. This result also<br />
shows that the related subspace fitt<strong>in</strong>g or asymptotic maximum likeli-
2.2 Data Model 37<br />
hood approaches of [VO91] <strong>and</strong> [SS90a], which produced the statistically<br />
efficient MODE <strong>in</strong> the forward-only case, may fail to yield statistically<br />
optimal estimators <strong>in</strong> other cases.<br />
2.2 Data Model<br />
Let x(t) ∈ C m×1 denote the output vector of a ULA that comprises m elements.<br />
The variable t denotes the sampl<strong>in</strong>g po<strong>in</strong>t; hence, t = 1, 2, ..., N,<br />
where N is the number of available samples. Under the assumption that<br />
the signals imp<strong>in</strong>g<strong>in</strong>g on the array are narrowb<strong>and</strong> with the same center<br />
frequency, the array output can be described by the equation (see, e.g.,<br />
[SM97])<br />
x(t) = As(t) + n(t). (2.1)<br />
Here, s(t) ∈ Cd×1 is the vector of the d signals (translated to baseb<strong>and</strong>)<br />
that imp<strong>in</strong>ge on the array, n(t) ∈ Cm×1 is a noise term, <strong>and</strong> A is the<br />
V<strong>and</strong>ermonde matrix<br />
⎡<br />
⎤<br />
1 ... 1<br />
⎢<br />
A = ⎢<br />
⎣<br />
e iω1 e iωd<br />
.<br />
.<br />
e i(m−1)ω1 ... e i(m−1)ωd<br />
where {ωk} d k=1 are the so-called spatial frequencies (ωk = φk def<strong>in</strong>ed <strong>in</strong><br />
(1.15)). We assume that<br />
<strong>and</strong> that m > d, which implies that<br />
ωk �= ωj for k �= j<br />
rank(A) = d.<br />
The follow<strong>in</strong>g assumption, frequently used <strong>in</strong> the array process<strong>in</strong>g literature,<br />
is also made here: the signal vector <strong>and</strong> the noise are temporally<br />
white, zero-mean, circular Gaussian r<strong>and</strong>om variables that are <strong>in</strong>dependent<br />
of one another; additionally, the noise is spatially white <strong>and</strong> has the<br />
same power <strong>in</strong> all sensors. Mathematically, this assumption implies that<br />
⎥<br />
⎦ ,<br />
E{s(t)s ∗ (s)} = P0δt,s , (2.2)<br />
E{s(t)s T (s)} = 0, (2.3)<br />
E{n(t)n ∗ (s)} = σ 2 Iδt,s , (2.4)<br />
E{n(t)n T (s)} = 0, (2.5)
38 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
where E is the statistical expectation operator, P0 denotes the signal<br />
covariance matrix, σ 2 is the noise power <strong>in</strong> each element of the array, δt,s<br />
is the Kronecker delta, <strong>and</strong> the superscripts ∗ <strong>and</strong> T denote the conjugate<br />
transpose <strong>and</strong> the transpose, respectively. We also assume that<br />
rank(P0) = d,<br />
which is only done to simplify the notation (<strong>in</strong>deed, most of the results<br />
presented <strong>in</strong> what follows carry over to the case when rank(P0) < d).<br />
A pr<strong>in</strong>cipal goal of array process<strong>in</strong>g consists of estimat<strong>in</strong>g {ωk} d k=1<br />
from {x(t)} N t=1, without assum<strong>in</strong>g knowledge of the <strong>in</strong>com<strong>in</strong>g signals<br />
{s(t)} or (of course) the noise {n(t)}. This is precisely the problem<br />
we will consider <strong>in</strong> this chapter. <strong>On</strong>ce {ωk} are estimated, the directions<br />
(also called angles-of-arrival) of the d signals can be easily obta<strong>in</strong>ed<br />
[VO91, SS90a, SN90, RH93, KP89, Pil89, SM97].<br />
2.3 Prelim<strong>in</strong>ary Results<br />
In this section, we present most of the mathematical results that are<br />
used <strong>in</strong> the next sections to derive <strong>and</strong> analyze FB-MODE. Some of<br />
these results are known, yet we provide simple proofs for all results, for<br />
completeness.<br />
It follows from (2.1), (2.2)-(2.5) that the array output x(t) is a temporally<br />
white, zero-mean, circular Gaussian r<strong>and</strong>om variable with the<br />
covariance properties<br />
E{x(t)x ∗ (t)} = AP0A ∗ + σ 2 I � R0, (2.6)<br />
E{x(t)x T (t)} = 0.<br />
Let J denote the reversal matrix of dimension m × m,<br />
⎡<br />
0<br />
⎢<br />
⎢0<br />
J = ⎢<br />
⎣ .<br />
...<br />
·<br />
0<br />
1<br />
1<br />
0<br />
··<br />
1 ... 0<br />
⎤<br />
⎥<br />
.<br />
⎥<br />
. ⎦<br />
0<br />
.<br />
Similarly ˜ J will denote the (m−d)×(m−d) reversal matrix. It is readily<br />
checked that<br />
JA c = AΦ, (2.7)
2.3 Prelim<strong>in</strong>ary Results 39<br />
where the superscript c st<strong>and</strong>s for the complex conjugate, <strong>and</strong><br />
⎡<br />
e<br />
⎢<br />
Φ = ⎣<br />
−i(m−1)ω1 . ..<br />
0<br />
⎤<br />
⎥<br />
⎦.<br />
Hence, we have<br />
<strong>and</strong>, accord<strong>in</strong>gly,<br />
Let<br />
where<br />
0 e −i(m−1)ωd<br />
Jx c (t) = AΦs c (t) + Jn c (t) (2.8)<br />
JR c 0J = AΦP c 0Φ ∗ A ∗ + σ 2 I.<br />
R = 1�<br />
R0 + JR<br />
2<br />
c 0J � = APA ∗ + σ 2 I, (2.9)<br />
P = 1�<br />
P0 + ΦP<br />
2<br />
c 0Φ ∗� .<br />
In the equations above, R0 is the so-called “forward” covariance matrix,<br />
whereas R is called the “forward-backward” covariance. The reason for<br />
this term<strong>in</strong>ology is that, while the <strong>in</strong>dices of the elements of x(t) run<br />
forward (1, 2, ..., m), those of the elements of Jxc (t) run backward<br />
(m, (m − 1), ..., 1). It will be useful for what follows to observe that<br />
R0 <strong>and</strong> R have the same structure: the only difference between the<br />
expressions <strong>in</strong> (2.6) <strong>and</strong> (2.9) for these two matrices is that P0 <strong>in</strong> R0 is<br />
replaced by P <strong>in</strong> R.<br />
Next, let us <strong>in</strong>troduce the Toeplitz matrix<br />
⎡<br />
⎤<br />
B ∗ =<br />
b0 ... bd 0<br />
. .. . ..<br />
⎥<br />
⎦ (m − d) × m, (2.10)<br />
⎢<br />
⎣<br />
0 b0 ... bd<br />
where the complex-valued coefficients {bk} are def<strong>in</strong>ed through<br />
b0 + b1z + · · · + bdz d = bd<br />
d�<br />
k=1<br />
�<br />
z − e iωk<br />
�<br />
. (2.11)
40 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
Because the polynomial <strong>in</strong> (2.11) has all the zeros on the unit circle, it<br />
can be written such that its coefficients satisfy the so-called “conjugate<br />
symmetry constra<strong>in</strong>t” (see, e.g., [SS90b]):<br />
bk = b c d−k (for k = 0, 1, ..., d). (2.12)<br />
It follows easily from (2.10) <strong>and</strong> (2.11) that<br />
B ∗ A = 0, (2.13)<br />
which, along with the fact that rank(A) = d <strong>and</strong> rank(B) = m − d,<br />
implies that<br />
R(B) = N(A ∗ ). (2.14)<br />
Hereafter, R(·) <strong>and</strong> N(·) denote the range, respectively, the null space<br />
associated with the matrix <strong>in</strong> question.<br />
To end up these notational preparations, let<br />
R = � Es<br />
����<br />
d<br />
�<br />
En<br />
����<br />
m−d<br />
� Λs 0<br />
0 σ2I � � �<br />
∗ Es E ∗ n<br />
(2.15)<br />
denote the eigenvalue decomposition (EVD) of the matrix R, where the<br />
matrix of eigenvectors � �<br />
Es En is unitary,<br />
<strong>and</strong> where<br />
E ∗ sEs = I; E ∗ nEn = I; E ∗ sEn = 0,<br />
⎡<br />
λ1<br />
⎢<br />
Λs = ⎣<br />
. ..<br />
⎤<br />
0<br />
⎥<br />
⎦.<br />
0 λd<br />
Here, {λk} are the d largest eigenvalues of R arranged <strong>in</strong> a decreas<strong>in</strong>g<br />
order: λ1 > λ2 > · · · > λd. We assume that λk �= λp (for k �= p; k,p =<br />
1, 2, ..., d), which is generically true. Note from (2.15) that the smallest<br />
(m −d) eigenvalues of R are identical <strong>and</strong> equal to σ2 , a fact that follows<br />
easily from (2.9). The EVD of R0 is similarly def<strong>in</strong>ed, but with Es, En,<br />
<strong>and</strong> Λs replaced by Es0 , En0 , <strong>and</strong> Λs0 .<br />
The sample covariance matrices correspond<strong>in</strong>g to R0 <strong>and</strong> R are def<strong>in</strong>ed<br />
as<br />
ˆR0 = 1<br />
N<br />
N�<br />
x(t)x ∗ (t),<br />
t=1
2.3 Prelim<strong>in</strong>ary Results 41<br />
<strong>and</strong><br />
ˆR = 1<br />
�<br />
ˆR0 + J<br />
2<br />
ˆ R c �<br />
0J . (2.16)<br />
It is well known that, under the assumptions made, ˆ R0 <strong>and</strong> ˆ R converge<br />
(with probability one <strong>and</strong> <strong>in</strong> mean square) to R0 <strong>and</strong>, respectively, R,<br />
as N tends to <strong>in</strong>f<strong>in</strong>ity. Hence ˆ R0 <strong>and</strong> ˆ R are consistent estimates of<br />
the correspond<strong>in</strong>g theoretical covariance matrices [SM97, And84, SS89].<br />
In fact ˆ R0 can be shown to be the (unstructured) maximum likelihood<br />
estimate (MLE) of R0 [And84, SS89]. By the <strong>in</strong>variance pr<strong>in</strong>ciple of ML<br />
estimation, it then follows that ˆ R is the MLE of R. We let<br />
ˆR = � Ês<br />
����<br />
d<br />
�<br />
Ên<br />
����<br />
m−d<br />
� Λs<br />
ˆ 0<br />
0 Λn<br />
ˆ<br />
denote the EVD of ˆ R, with the eigenvalues <strong>in</strong><br />
� �<br />
ˆΛs 0<br />
0 ˆ Λn<br />
� � �<br />
∗ Ês arranged <strong>in</strong> a decreas<strong>in</strong>g order. The EVD of ˆ R0 is similarly def<strong>in</strong>ed, but<br />
with a subscript 0 attached to Ês, etc., to dist<strong>in</strong>guish it from the EVD<br />
of ˆ R.<br />
2.3.1 Results on A <strong>and</strong> B<br />
R1. The matrix B satisfies<br />
<strong>and</strong><br />
Ê ∗ n<br />
JB c = B ˜ J (2.17)<br />
˜J � B ∗ B � ˜ J = � B ∗ B � c. (2.18)<br />
Proof. Recall that the polynomial coefficients {bk} satisfy the conjugate<br />
symmetry constra<strong>in</strong>t (2.14). Hence<br />
JB c ⎡<br />
0 bd<br />
⎢ ·<br />
= ⎢<br />
⎣<br />
··<br />
.<br />
bd b0<br />
. · ··<br />
⎤ ⎡<br />
0 b<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ = ⎢<br />
⎥ ⎢<br />
⎦ ⎣<br />
b0 0<br />
c 0<br />
· ··<br />
.<br />
bc 0 bc d<br />
. · ··<br />
bc ⎤<br />
⎥ = B<br />
⎥<br />
⎦<br />
d 0<br />
˜ J,
42 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
which proves (2.17). Equation (2.18) is a simple corollary of (2.17).<br />
R2. Let<br />
a(ω) = � 1 e iω ... e i(m−1)ω� T<br />
denote a generic column of the matrix A, <strong>and</strong> let<br />
Then<br />
d(ω) � da(ω)<br />
dω = i� 0 e iω 2e i2ω ... (m − 1)e i(m−1)ω� T .<br />
B ∗ Jd c (ω) = e −i(m−1)ω B ∗ d(ω) for ω = ωj (j = 1, ..., d). (2.19)<br />
Proof. From (2.7), we have<br />
Ja c (ω) = e −i(m−1)ω a(ω).<br />
Differentiat<strong>in</strong>g both sides with respect to ω gives<br />
Jd c (ω) = e −i(m−1)ω d(ω) − i(m − 1)e −i(m−1)ω a(ω). (2.20)<br />
In view of (2.13), B ∗ a(ω) = 0 at ω = ωj (j = 1, ..., d), <strong>and</strong> hence (2.19)<br />
is obta<strong>in</strong>ed by pre-multiply<strong>in</strong>g (2.20) by B ∗ .<br />
2.3.2 Results on R<br />
R3. Let e denote a generic eigenvector of R, associated with a dist<strong>in</strong>ct<br />
eigenvalue λ. Then<br />
Je c = e iψ e for some ψ ∈ (−π,π]. (2.21)<br />
In particular, Equation (2.21) implies that<br />
JE c s = EsΓ,<br />
where Es is as def<strong>in</strong>ed <strong>in</strong> (2.15), <strong>and</strong><br />
⎡<br />
e<br />
⎢<br />
Γ = ⎣<br />
iψ1 . ..<br />
0<br />
0 eiψd ⎤<br />
⎥<br />
⎦ ; ψk ∈ (−π,π] (k = 1, ..., d).
2.3 Prelim<strong>in</strong>ary Results 43<br />
Proof. Because λ is real-valued <strong>and</strong> JR c J = R, we have the follow<strong>in</strong>g<br />
equivalences:<br />
Re = λe ⇐⇒ JR c Je = λe ⇐⇒ R � Je c� = λ � Je c� .<br />
Hence, Je c is also an eigenvector associated with λ. Then it must be<br />
related to e as shown <strong>in</strong> (2.21), as λ is dist<strong>in</strong>ct by assumption.<br />
The above result is a consequence of the centro-Hermitian property of R.<br />
Hence, a similar result holds for ˆ R as well.<br />
2.3.3 Results on P <strong>and</strong> R<br />
R4. The condition number of P<br />
cond(P) � λmax(P)<br />
λm<strong>in</strong>(P)<br />
is never greater than that of P0. Here, λmax(P) <strong>and</strong> λm<strong>in</strong>(P) denote the<br />
largest <strong>and</strong> smallest eigenvalues of P, respectively. A similar result holds<br />
for R <strong>and</strong> R0.<br />
Proof. For a Hermitian matrix it holds that<br />
λmax(P) = max<br />
x<br />
λm<strong>in</strong>(P) = m<strong>in</strong><br />
x<br />
x∗Px x∗x ,<br />
x ∗ Px<br />
x ∗ x .<br />
Hence, mak<strong>in</strong>g use of the fact that P0 <strong>and</strong> P c 0 have the same eigenvalues,<br />
we obta<strong>in</strong><br />
λmax(P0 + ΦP c 0Φ ∗ ) = max<br />
x<br />
≤ max<br />
x<br />
λm<strong>in</strong>(P0 + ΦP c 0Φ ∗ ) = m<strong>in</strong><br />
x<br />
≥ m<strong>in</strong><br />
x<br />
x ∗ P0x + x ∗ ΦP c 0Φ ∗ x<br />
x∗P0x x∗x x ∗ x<br />
+ max<br />
y<br />
y∗Pc 0y<br />
y∗y x ∗ P0x + x ∗ ΦP c 0Φ ∗ x<br />
x ∗ P0x<br />
x ∗ x<br />
from which the asserted result follows.<br />
x ∗ x<br />
+ m<strong>in</strong><br />
y<br />
y ∗ P c 0y<br />
y ∗ y<br />
= 2λmax(P0),<br />
= 2λm<strong>in</strong>(P0),
44 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
Based on the above result we can say that the signals are less correlated<br />
<strong>in</strong> R than <strong>in</strong> R0. In particular, P may be nons<strong>in</strong>gular even though P0<br />
is s<strong>in</strong>gular. This observation, along with the fact that the performance<br />
of many array process<strong>in</strong>g algorithms deteriorates as the signal correlation<br />
<strong>in</strong>creases, lies at the basis of the belief that the FB approach should<br />
outperform the F-only approach. This would really be true if one had<br />
two sets of <strong>in</strong>dependent data vectors with covariance matrices R0 <strong>and</strong>,<br />
respectively, R. However, only R0 <strong>and</strong> ˆ R0 correspond to such a data<br />
set, whereas R <strong>and</strong> ˆ R do not. In fact, the statistical properties of ˆ R<br />
are quite different from those of ˆ R0, <strong>and</strong> this may very well counterbalance<br />
the desirable property that P <strong>in</strong> R is better conditioned than P0 <strong>in</strong><br />
R0. Consequently, the FB approach may not lead to any performance<br />
enhancement. This discussion provides an <strong>in</strong>tuitive motivation for the<br />
superiority of F-MODE over FB-MODE, which will be shown later <strong>in</strong><br />
the chapter.<br />
2.3.4 Results on R0 <strong>and</strong> R<br />
The follow<strong>in</strong>g results are valid for both R0 <strong>and</strong> R. However, for the sake<br />
of simplicity, we state them only for R.<br />
R5. The columns of Es, {ek} d k=1 , belong to the range of A:<br />
More exactly,<br />
where<br />
ek ∈ R(A). (2.22)<br />
Es = A � PA ∗ Es ˜ Λ −1� ,<br />
˜Λ = Λs − σ 2 I. (2.23)<br />
Proof. A simple calculation shows that<br />
REs = EsΛs = APA ∗ Es + σ 2 Es ⇒ Es ˜ Λ = A � PA ∗ �<br />
Es ,<br />
from which (2.23) follows.<br />
R6. Under the assumption made that rank(P) = d,<br />
E ∗ nA = 0 ⇐⇒ R(En) = N(A ∗ ).
2.3 Prelim<strong>in</strong>ary Results 45<br />
Proof. This result follows at once from R5 <strong>and</strong> the fact that E ∗ nEs =<br />
0.<br />
R7. The follow<strong>in</strong>g equality holds:<br />
Proof. From<br />
PA ∗ R −1 AP = � A ∗ A � −1 A ∗ Es ˜ Λ 2 Λ −1<br />
s E ∗ sA � A ∗ A � −1 . (2.24)<br />
R = APA ∗ + σ 2 I = EsΛsE ∗ s + σ 2 EnE ∗ n = Es ˜ ΛE ∗ s + σ 2 I<br />
we obta<strong>in</strong><br />
P = � A ∗ A � −1 A ∗ Es ˜ ΛE ∗ sA � A ∗ A � −1 .<br />
Insert<strong>in</strong>g the above expression for P <strong>in</strong> the left-h<strong>and</strong> side of (2.24), <strong>and</strong><br />
mak<strong>in</strong>g use of R5 <strong>and</strong> R6, yields<br />
� A ∗ A � −1A ∗ Es ˜ ΛE ∗ sA � A ∗ A � −1 A ∗<br />
× � EsΛ −1<br />
s E ∗ s + 1<br />
σ2EnE ∗� � � ∗ −1A∗ n A A A Es ˜ ΛE ∗ sA � A ∗ A �−1 which we had to prove.<br />
= � A ∗ A � −1 A ∗ Es ˜ Λ 2 Λ −1<br />
s E ∗ sA � A ∗ A � −1 ,<br />
R8. Let ≃ denote an asymptotically valid equality, that is, an equality<br />
<strong>in</strong> which only the asymptotically dom<strong>in</strong>ant terms (<strong>in</strong> the mean square<br />
sense) have been reta<strong>in</strong>ed. Then,<br />
Proof. From (2.3), we have<br />
B ∗ Ês ≃ B ∗ ˆ REs ˜ Λ −1 . (2.25)<br />
B ∗ ˆ RÊs = B ∗ Ês ˆ Λs,<br />
B ∗ ˆ REs + B ∗ R Ês = B ∗ Ês ˆ Λs,<br />
B ∗ ˆ REs = B ∗ Ês( ˆ Λs − σ 2 I),<br />
B ∗ Ês ≃ B ∗ ˆ REs ˜ Λ −1 .<br />
The second <strong>and</strong> the third step follow from the facts that B ∗ REs = 0<br />
<strong>and</strong> B ∗ R = σ 2 B ∗ . S<strong>in</strong>ce Ês → Es <strong>and</strong> ˆ Λs → Λs as N → ∞, <strong>and</strong> s<strong>in</strong>ce<br />
B ∗ Es = 0, the first order approximation <strong>in</strong> the last step results.
46 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
2.3.5 Results on ABC Estimation<br />
Let z be a complex-valued r<strong>and</strong>om vector that depends on an unknown<br />
real-valued parameter vector ω, <strong>and</strong> whose mean <strong>and</strong> covariance matrix<br />
tend to zero (typically as 1/N), as N <strong>in</strong>creases. Then an asymptotically<br />
best (<strong>in</strong> the mean square sense) consistent (ABC) estimate of the parameter<br />
vector ω, based on z, is given by the global m<strong>in</strong>imizer of the function<br />
[SS89]<br />
where<br />
<strong>and</strong> where<br />
µ =<br />
µ T C −1<br />
µ µ, (2.26)<br />
� �<br />
Re(z)<br />
,<br />
Im(z)<br />
Cµ = E{µµ T }<br />
is assumed to be <strong>in</strong>vertible. The ABC estimate above is the extension of<br />
the Markov l<strong>in</strong>ear estimator to a class of nonl<strong>in</strong>ear regression equations.<br />
The m<strong>in</strong>imization of (2.26) can also be given an approximate (asymptotically<br />
valid) ML <strong>in</strong>terpretation (see, e.g., [SS90a]). The estimat<strong>in</strong>g<br />
criterion (2.26) also results <strong>in</strong> certa<strong>in</strong> estimation problems, the solutions<br />
of which are derived by weighted subspace fitt<strong>in</strong>g techniques [VO91].<br />
In most applications we derive E{zz∗ } <strong>and</strong> E{zzT }, but not Cµ directly.<br />
It is hence more convenient to work with the vector<br />
�<br />
z<br />
ρ =<br />
zc �<br />
(2.27)<br />
rather than with µ. The next result obta<strong>in</strong>s the function of ρ whose<br />
m<strong>in</strong>imization is equivalent to that of (2.26).<br />
R9. Let Cρ = E{ρρ ∗ }. Then,<br />
Proof. We have<br />
where<br />
µ T C −1<br />
µ µ = ρ ∗ C −1<br />
ρ ρ. (2.28)<br />
L = 1<br />
2<br />
µ = Lρ,<br />
� �<br />
I I<br />
.<br />
−iI iI
2.4 Brief Review of F-MODE 47<br />
As L is nons<strong>in</strong>gular,<br />
which concludes the proof.<br />
µ T C −1<br />
µ µ = ρ ∗ L ∗� LCρL ∗� −1 Lρ = ρ ∗ C −1<br />
ρ ρ,<br />
2.4 Brief Review of F-MODE<br />
Consider the ABC estimator (see Section 2.3) obta<strong>in</strong>ed from<br />
z = vec(B ∗ Ês0 ), (2.29)<br />
where vec denotes column-wise vectorization. The ABC criterion correspond<strong>in</strong>g<br />
to z above can be shown to be<br />
Tr{ � B ∗ B � −1 B ∗ Ês0<br />
˜Λ 2<br />
0Λs −1<br />
0 Ê∗s0 B}, (2.30)<br />
where Tr denotes the trace operator. The F-MODE estimate of ω �<br />
[ω1 ... ωd] T is obta<strong>in</strong>ed as an asymptotically valid approximation to the<br />
m<strong>in</strong>imizer of (2.30), <strong>in</strong> the follow<strong>in</strong>g steps [SS90a]:<br />
1. Obta<strong>in</strong> Ês0 , ˆ Λs0 , <strong>and</strong> ˆ˜ Λ0 from the EVD of ˆ R0. Derive an <strong>in</strong>itial<br />
estimate of {bk} by m<strong>in</strong>imiz<strong>in</strong>g the quadratic function<br />
Tr{B ∗ ˆ˜Λ Ês0<br />
2<br />
Ê ∗ s0B}. 0 ˆ Λ −1<br />
s0<br />
2. Let � ˆ B ∗ ˆ B � be the estimate of � B ∗ B � made from the previously<br />
obta<strong>in</strong>ed estimates of {bk}. Derive ref<strong>in</strong>ed estimates of {bk} by<br />
m<strong>in</strong>imiz<strong>in</strong>g the quadratic function<br />
Tr{ � �<br />
Bˆ ∗<br />
Bˆ −1B∗ ˆ˜Λ Ês0<br />
2<br />
0 ˆ Λ −1<br />
s0<br />
Ê ∗ s0 B}.<br />
Possibly repeat this operation once more, us<strong>in</strong>g the latest estimate<br />
of {bk} to obta<strong>in</strong> � ˆ B ∗ ˆ B � . F<strong>in</strong>ally, derive estimates of {ωk} by<br />
root<strong>in</strong>g the polynomial � d<br />
k=0 ˆ bkz k (see (2.11)).<br />
The m<strong>in</strong>imization <strong>in</strong> steps 1 <strong>and</strong> 2 above should be conducted under an<br />
appropriate constra<strong>in</strong>t on {bk} (to prevent the trivial solution {bk = 0}).<br />
Typically, one uses<br />
d�<br />
|bk| 2 = 1.<br />
k=0
48 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
Additionally the {bk} should satisfy (2.12).<br />
Asymptotically (as N <strong>in</strong>creases) the F-MODE estimate is Gaussian<br />
distributed with mean equal to the true parameter vector ω <strong>and</strong> covariance<br />
matrix,<br />
CF = σ2<br />
�<br />
Re<br />
2N<br />
� U ⊙ Q T� 0<br />
�−1 , (2.31)<br />
where ⊙ denotes the Hadamard product (i.e., element-wise multiplication),<br />
<strong>and</strong> where<br />
<strong>and</strong><br />
U = D ∗ Π ⊥ AD, (2.32)<br />
Q0 = P0A ∗ R −1<br />
0 AP0, (2.33)<br />
D = � d(ω1) ... d(ωd) � ; d(ω) = da(ω)<br />
dω<br />
Π ⊥ A = I − A � A ∗ A � −1 A ∗<br />
is the orthogonal projector onto N(A ∗ ). Because (2.31) is also the CRB<br />
matrix for the estimation problem under consideration, it follows that<br />
F-MODE is asymptotically statistically efficient [VO91, SS90a, SN90].<br />
2.5 Derivation of FB-MODE<br />
Similarly to (2.29), the FB-MODE is associated with the ABC estimate<br />
derived from<br />
⎡ ⎤<br />
z = vec(B ∗ Ês) =<br />
⎢<br />
⎣<br />
B ∗ ê1<br />
.<br />
B ∗ êd<br />
⎥<br />
⎦.<br />
As shown <strong>in</strong> Appendix A.1, asymptotically <strong>in</strong> N,<br />
<strong>and</strong><br />
E{zz ∗ } = σ2 � −2 � � �<br />
˜Λ ∗<br />
Λs ⊗ B B<br />
2N<br />
(2.34)<br />
E{zz T } = σ2 � −2<br />
˜Λ ΛsΓ<br />
2N<br />
∗� ⊗ � B ∗ B˜ J � , (2.35)
2.5 Derivation of FB-MODE 49<br />
where ⊗ is the Kronecker matrix product operator. Consequently, see<br />
(2.27),<br />
��<br />
z<br />
Cρ � E<br />
zc �<br />
�z<br />
∗ zT�� = σ2<br />
� �˜Λ −2 � � � �<br />
∗<br />
−2<br />
Λs ⊗ B B ˜Λ ΛsΓ<br />
2N<br />
∗� ⊗ � B∗B˜ J �<br />
� −2<br />
˜Λ ΛsΓ � ⊗ � � (2.36)<br />
� �<br />
JB ˜ ∗<br />
−2 � � �<br />
B ˜Λ ∗ c .<br />
Λs ⊗ B B<br />
Us<strong>in</strong>g the fact that � B ∗ B � c = ˜ JB ∗ B ˜ J (see R1), we can factor Cρ as (we<br />
omit the scalar factor σ 2 /2N <strong>in</strong> (2.36) s<strong>in</strong>ce it has no <strong>in</strong>fluence on the<br />
estimate obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the ABC criterion)<br />
Cρ =<br />
��˜Λ −2 �<br />
Λs ⊗ I 0<br />
0<br />
×<br />
� −2 �<br />
˜Λ Λs ⊗ J˜<br />
� � � � �<br />
∗ ∗ I ⊗ I Γ ⊗ I I ⊗ B B<br />
Γ ⊗ I I ⊗ I<br />
The middle matrix <strong>in</strong> (2.37) can be rewritten as<br />
� �<br />
I ⊗ I �I � ∗<br />
⊗ I Γ ⊗ I<br />
Γ ⊗ I<br />
�<br />
0<br />
0 I ⊗ � B ∗ B ˜ J �<br />
�<br />
. (2.37)<br />
<strong>and</strong> it is evidently s<strong>in</strong>gular. Consequently Cρ is s<strong>in</strong>gular <strong>and</strong> hence the<br />
ABC estimat<strong>in</strong>g criterion <strong>in</strong> (2.28) cannot be used as it st<strong>and</strong>s. There are<br />
several possibilities for overcom<strong>in</strong>g this difficulty; see, e.g., [SS89, Gor97].<br />
Here we adopt the approach of [Gor97], where it was shown that if Cρ(ε)<br />
denotes a nons<strong>in</strong>gular matrix obta<strong>in</strong>ed by a slight perturbation of Cρ,<br />
then we can use (2.28) with Cρ(ε) <strong>in</strong> lieu of Cρ. The m<strong>in</strong>imizer of the<br />
so-obta<strong>in</strong>ed criterion,<br />
ρ ∗ C −1<br />
ρ (ε)ρ (2.38)<br />
is then an ABC estimate, as the perturbation parameter ε goes to zero.<br />
If the limit of (2.38), as ε → 0, cannot be evaluated analytically then we<br />
can obta<strong>in</strong> an (almost) ABC estimate by m<strong>in</strong>imiz<strong>in</strong>g (2.38) with a “very<br />
small” ε value. Interest<strong>in</strong>gly enough, <strong>in</strong> the case under discussion, we can<br />
obta<strong>in</strong> a closed-form expression for the limit criterion. To this end, let<br />
Γε = (1 − ε)Γ
50 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
<strong>and</strong> let Cρ(ε) be given by (2.37) where Γ is replaced by Γε. It can be<br />
readily checked that (for ε �= 0)<br />
� � ∗ −1 � � ∗<br />
I ⊗ I Γε × I 1 I ⊗ I −Γε ⊗ I<br />
=<br />
,<br />
Γε ⊗ I I ⊗ I ε(2 − ε) −Γε ⊗ I I ⊗ I<br />
which implies that<br />
C −1<br />
ρ (ε)<br />
�<br />
1<br />
∗ −1 I ⊗ (B B) 0<br />
=<br />
ε(2 − ε) 0 I ⊗ � J(B ˜ ∗ −1 B) �<br />
�<br />
� � ∗<br />
I ⊗ I −Γε ⊗ I<br />
×<br />
−Γε ⊗ I I ⊗ I<br />
�<br />
( ˜ Λ 2 Λ −1<br />
s ) ⊗ I 0<br />
0 ( ˜ Λ 2 Λ −1<br />
s ) ⊗ ˜ J<br />
1<br />
=<br />
ε(2 − ε)<br />
�<br />
(<br />
×<br />
˜ Λ 2 Λ −1<br />
s ) ⊗ (B∗B) −1 −(Γε ˜ Λ 2 Λ −1<br />
s ) ⊗ � (B∗B) −1˜<br />
�<br />
J<br />
−(Γε ˜ Λ 2 Λ −1<br />
s ) ⊗ � J(B ˜ ∗ −1 B) �<br />
( ˜ Λ 2 Λ −1<br />
s ) ⊗ � �<br />
J(B ˜ ∗B) −1˜<br />
�<br />
J<br />
Insert<strong>in</strong>g (2.39) <strong>in</strong>to (2.38) yields<br />
ε(2 − ε)ρ ∗ C −1<br />
�<br />
∗<br />
ρ (ε)ρ = z ( ˜ Λ 2 Λ −1<br />
s ) ⊗ (B ∗ B) −1�<br />
z<br />
� �� �<br />
Ω1<br />
�<br />
T<br />
− z (Γε ˜ Λ 2 Λ −1<br />
s ) ⊗ � J(B ˜ ∗ −1<br />
B) ��<br />
z<br />
� �� �<br />
Ω2<br />
�<br />
∗<br />
− z (Γε ˜ Λ 2 Λ −1<br />
s ) ⊗ � (B ∗ B) −1˜<br />
�<br />
J �<br />
z<br />
� �� �<br />
c<br />
Ω3<br />
�<br />
T<br />
+ z ( ˜ Λ 2 Λ −1<br />
s ) ⊗ � J(B ˜ ∗<br />
B)<br />
−1˜<br />
�<br />
J �<br />
z<br />
� �� �<br />
c .<br />
Us<strong>in</strong>g R1 we obta<strong>in</strong> (observe the def<strong>in</strong>ition of Ω1, ..., Ω4 above)<br />
<strong>and</strong><br />
Ω4 = Ω c 1<br />
Ω3 = Ω c 2.<br />
Ω4<br />
�<br />
.<br />
(2.39)<br />
(2.40)
2.5 Derivation of FB-MODE 51<br />
Hence we can rewrite (2.40) <strong>in</strong> a more compact form:<br />
ε(2 − ε)<br />
ρ<br />
2<br />
∗ C −1<br />
ρ (ε)ρ = z ∗ �<br />
Ω1z − Re z T �<br />
Ω2z . (2.41)<br />
Next, we make use of the fact that, for conformable matrices A, X, B,<br />
<strong>and</strong> Y,<br />
to obta<strong>in</strong><br />
<strong>and</strong><br />
Tr{AX ∗ BY} = � vec(X) � ∗� A T ⊗ B � vec(Y)<br />
z ∗ Ω1z = Tr{ ˜ Λ 2 Λ −1<br />
s Ê∗ sB(B ∗ B) −1 B ∗ Ês} (2.42)<br />
z T Ω2z = Tr{Γε ˜ Λ 2 Λ −1<br />
s ÊT s B c˜ J(B ∗ B) −1 B ∗ Ês}. (2.43)<br />
F<strong>in</strong>ally, we use R1 <strong>and</strong> R3 to rewrite (2.43) as<br />
z T Ω2z = (1 − ε)Tr{ ˜ Λ 2 Λ −1<br />
s Γ ÊT s JB(B ∗ B) −1 B ∗ Ês}<br />
= (1 − ε)Tr{ ˜ Λ 2 Λ −1<br />
s Ê∗ sB(B ∗ B) −1 B ∗ Ês} = (1 − ε)z ∗ Ω1z.<br />
(2.44)<br />
Comb<strong>in</strong><strong>in</strong>g (2.41), (2.42), <strong>and</strong> (2.44) leads to the function<br />
ρ ∗ C −1<br />
ρ (ε)ρ = 2 � ∗<br />
z Ω1z<br />
2 − ε<br />
� ,<br />
the limit of which, as ε → 0, yields the FB-MODE criterion:<br />
Tr{(B ∗ B) −1 B ∗ Ês ˜ Λ 2 Λ −1<br />
s Ê∗ sB}. (2.45)<br />
Compar<strong>in</strong>g (2.45) with (2.30) we see that the FB-MODE criterion has<br />
exactly the same structure as the F-MODE criterion, with the only difference<br />
that the former criterion depends on the eigenelements of the FB<br />
sample covariance matrix. Ow<strong>in</strong>g to this neat result, the FB-MODE estimate<br />
of ω can be obta<strong>in</strong>ed by apply<strong>in</strong>g to ˆ R the two- (or three-) step<br />
algorithm outl<strong>in</strong>ed <strong>in</strong> Section 2.4.
52 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
2.6 Analysis of FB-MODE<br />
The FB-MODE criterion <strong>in</strong> (2.45) can be parameterized directly via the<br />
parameters of <strong>in</strong>terest, {ωk}. Indeed, it follows from (2.14) that<br />
B(B ∗ B) −1 B ∗ = Π ⊥ A<br />
<strong>and</strong> hence that (2.45) can be rewritten as<br />
where<br />
f(ω) � Tr{ ˆ W Ê∗ sΠ ⊥ A Ês}, (2.46)<br />
ˆW = ˆ˜ Λ 2 ˆΛ −1<br />
s .<br />
Later we will also use the notation W = ˜ Λ 2 Λ −1<br />
s . The reason that ˆ W<br />
appears <strong>in</strong> (2.46), rather than W as <strong>in</strong> (2.45), is that the estimator should<br />
be based on the sample data. It will, however, be shown that it is only<br />
the limit matrix W that affects the asymptotic performance. Let<br />
g = ∂f(ω)<br />
∂ω<br />
denote the gradient vector of (2.46), let<br />
H = ∂2 f(ω)<br />
∂ω∂ω T<br />
denote the correspond<strong>in</strong>g Hessian matrix, <strong>and</strong> let<br />
H∞ = lim<br />
N→∞ H<br />
denote the asymptotic Hessian. Us<strong>in</strong>g this notation <strong>and</strong> a st<strong>and</strong>ard Taylor<br />
series expansion technique (the details of which are omitted <strong>in</strong> the<br />
<strong>in</strong>terest of brevity, see [VO91, SS90a] <strong>and</strong> [SN90] for similar analyses),<br />
we can show that the covariance matrix of the FB-MODE estimate of ω<br />
is asymptotically (for N ≫ 1) given by<br />
CFB = H −1<br />
�<br />
∞ E(gg T �<br />
) H −1<br />
∞ . (2.47)<br />
It rema<strong>in</strong>s to evaluate H∞ <strong>and</strong> the middle matrix <strong>in</strong> (2.47).
2.6 Analysis of FB-MODE 53<br />
Let Ai denote the derivative of A with respect to ωi:<br />
Ai = ∂A<br />
= diu<br />
∂ωi<br />
∗ i . (2.48)<br />
Here, di is a short notation for d(ωi), <strong>and</strong> ui = [0, ..., 0, 1, 0, ..., 0] T<br />
(with 1 <strong>in</strong> the ith position). A simple calculation shows that<br />
∂Π ⊥ A<br />
= −Ai(A ∗ A) −1 A ∗<br />
∂ωi<br />
+ A(A ∗ A) −1� A ∗ iA + A ∗ Ai<br />
= −Π ⊥ AAi(A ∗ A) −1 A ∗ − A(A ∗ A) −1 A ∗ iΠ ⊥ A.<br />
� (A ∗ A) −1 A ∗ − A(A ∗ A) −1 A ∗ i<br />
(2.49)<br />
Next note that – for conformable matrices X <strong>and</strong> Y, with Y Hermitian<br />
– we have<br />
Tr{(X + X ∗ )Y} = 2Re � Tr{XY} � . (2.50)<br />
Us<strong>in</strong>g (2.49) <strong>and</strong> (2.50) we obta<strong>in</strong><br />
�<br />
gi = −2Re Tr[ ˆ WÊ∗sA(A ∗ A) −1 A ∗ iΠ ⊥ AÊs] �<br />
. (2.51)<br />
Insert<strong>in</strong>g (2.48) <strong>in</strong> (2.51) gives<br />
gi ≃ −2Re<br />
�<br />
d ∗ iΠ ⊥ A ÊsWE ∗ sA(A ∗ A) −1 ui<br />
Tak<strong>in</strong>g the derivative of the right h<strong>and</strong> side of (2.51) with respect to ωj,<br />
we obta<strong>in</strong><br />
�<br />
Hij ≃ 2Re d ∗ iΠ ⊥ AAj(A ∗ A) −1 A ∗ EsWE ∗ sA(A ∗ A) −1 �<br />
ui<br />
�� = 2Re d ∗ iΠ ⊥ ���(A∗ −1 ∗<br />
Adj A) A EsWE ∗ sA(A ∗ A) −1�T � �<br />
ij<br />
= � �<br />
H∞ ij .<br />
Mak<strong>in</strong>g use of R7 yields the follow<strong>in</strong>g expression for the asymptotic<br />
Hessian matrix,<br />
H∞ = 2Re � U ⊙ Q T� ,<br />
�<br />
.
54 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
where U is as def<strong>in</strong>ed <strong>in</strong> (2.32), <strong>and</strong> Q is def<strong>in</strong>ed similarly to Q0 <strong>in</strong> (2.33),<br />
Q = PA ∗ R −1 AP.<br />
To derive an expression for the middle matrix <strong>in</strong> (2.47) we need the<br />
additional notation<br />
α ∗ i = d ∗ iB(B ∗ B) −1 ,<br />
β i = WE ∗ sA(A ∗ A) −1 ui,<br />
with the use of which we can write<br />
�<br />
−gi ≃ 2Re α ∗ i (B ∗ �<br />
Ês)βi It follows that<br />
E[gigj]<br />
≃ 2Re<br />
�<br />
= 2Re (β T i ⊗ α ∗ �<br />
i )z .<br />
�<br />
(β T i ⊗ α ∗ i )E[zz ∗ ](β c j ⊗ αj) + (β T i ⊗ α ∗ i )E[zz T ](βj ⊗ α c �<br />
j)<br />
� 2Re(T1 + T2).<br />
Next, we note that (cf. (2.34))<br />
2N<br />
σ 2 T1 = (β T i ˜ Λ −2 Λsβ c j)(α ∗ iB ∗ Bαj)<br />
= (β ∗ j ˜ Λ −2 Λsβ i)(α ∗ iB ∗ Bαj) = Q T ijUij.<br />
(2.52)<br />
(2.53)<br />
A convenient expression for T2 is slightly more difficult to obta<strong>in</strong>. First,<br />
we note that (cf. (2.35))<br />
2N<br />
σ 2 T2 = (β T i ˜ Λ −2 ΛsΓ ∗ β j)(α ∗ iB ∗ B ˜ Jα c j). (2.54)<br />
Next, we make use of R3 <strong>and</strong> (2.7) to write<br />
Γ ∗ β j = W(Γ ∗ E ∗ s)A(A ∗ A) −1 uj = WE T s (JA)(A ∗ A) −1 uj<br />
= WE T s A c Φ ∗ (A ∗ A) −1 uj = WE T s A c (A ∗ JA c ) −1 uj<br />
= WE T s A c (ΦA T A c ) −1 uj = WE T s A c (A T A c ) −1 e i(m−1)ωj uj<br />
= β c je i(m−1)ωj ,
2.6 Analysis of FB-MODE 55<br />
which gives<br />
β T i ˜ Λ −2 ΛsΓ ∗ β j = β T i ˜ Λ −2 Λsβ c je i(m−1)ωj<br />
F<strong>in</strong>ally, we use R1 <strong>and</strong> R2 to obta<strong>in</strong><br />
which yields<br />
= (β ∗ j ˜ Λ −2 Λsβ i)e i(m−1)ωj = Q T ije i(m−1)ωj . (2.55)<br />
˜Jα c j = (B T B c˜ J) −1 B T d c j = ( ˜ JB ∗ B) −1 B T d c j<br />
= (B ∗ B) −1 B ∗ Jd c j = (B ∗ B) −1 B ∗ dje −i(m−1)ωj ,<br />
α ∗ i (B ∗ B) ˜ Jα c j = d ∗ iB(B ∗ B) −1 B ∗ dje −i(m−1)ωj = Uije −i(m−1)ωj . (2.56)<br />
Insert<strong>in</strong>g (2.55) <strong>and</strong> (2.56) <strong>in</strong>to (2.54) gives the desired expression for T2:<br />
2N<br />
σ 2 T2 = Q T ijUij. (2.57)<br />
Comb<strong>in</strong><strong>in</strong>g (2.52), (2.53), <strong>and</strong> (2.57) leads to the follow<strong>in</strong>g expression for<br />
the asymptotic covariance matrix of FB-MODE:<br />
CFB = σ2<br />
�<br />
Re<br />
2N<br />
� U ⊙ Q T�� −1<br />
. (2.58)<br />
Compar<strong>in</strong>g (2.58) <strong>and</strong> (2.31) we see that the only difference between CFB<br />
<strong>and</strong> CF is that Q0 <strong>in</strong> CF is replaced by Q <strong>in</strong> CFB. If P0 is diagonal then<br />
Q = Q0 <strong>and</strong> hence CFB = CF. However, if P0 is non-diagonal (i.e., the<br />
signals are correlated) then, <strong>in</strong> general, Q �= Q0. To see this, consider for<br />
example a case where P0 is s<strong>in</strong>gular, which implies that Q0 is s<strong>in</strong>gular as<br />
well; however, P, <strong>and</strong> hence Q, may well be nons<strong>in</strong>gular. This example<br />
also shows that the difference matrix Q0 − Q is not necessarily positive<br />
semi-def<strong>in</strong>ite, <strong>in</strong> spite of the fact that, by the CRB <strong>in</strong>equality, we must<br />
have (A ≥ B below means that the matrix A−B is positive semi-def<strong>in</strong>ite)<br />
CFB ≥ CF<br />
⇐⇒<br />
(2.59)<br />
Re [U ⊙ (Q0 − Q)] ≥ 0. (2.60)<br />
(Of course, this is no contradiction because Q0 ≥ Q is sufficient for (2.60)<br />
to hold, but it is not necessary.)
56 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
As already <strong>in</strong>dicated above, the <strong>in</strong>equality <strong>in</strong> (2.59) is usually “strict”<br />
<strong>in</strong> the sense that CFB �= CF. Every time this happens, the FB-MODE is<br />
asymptotically statistically less efficient than the F-MODE. To study the<br />
difference (CFB − CF) quantitatively, as well as the extent to which the<br />
asymptotic results derived above hold <strong>in</strong> samples of practical lengths, we<br />
resort to numerical simulations (see the next section).<br />
2.7 Numerical Examples<br />
In this section we will study the difference <strong>in</strong> performance between F-<br />
MODE <strong>and</strong> FB-MODE by means of a two signal scenario. It is a simple<br />
exercise to verify that the covariance matrices P <strong>and</strong> P0 are identical for<br />
a certa<strong>in</strong> correlation phase between the two signals. Indeed, let the signal<br />
covariance matrix be given by<br />
� �<br />
p11 p12<br />
P0 =<br />
p ∗ 12 p22<br />
with p12 = ξe iφ . If the correlation phase is<br />
φ = 1<br />
2 (m − 1)(ω2 − ω1) + kπ,<br />
where k is any <strong>in</strong>teger, then P = P0. This implies that R = R0 <strong>and</strong> hence<br />
that CF = CFB. Thus, <strong>in</strong> this special case, the large sample performance<br />
is the same for F-MODE <strong>and</strong> FB-MODE. However, as previously mentioned,<br />
<strong>in</strong> general CFB > CF. In the follow<strong>in</strong>g, we choose φ so as to<br />
emphasize the difference between F-MODE <strong>and</strong> FB-MODE.<br />
Consider a ULA consist<strong>in</strong>g of four omni-directional <strong>and</strong> identical sensors<br />
separated by half of the carrier’s wavelength. The two signals imp<strong>in</strong>ge<br />
on the array from θ1 = −7.5 ◦ <strong>and</strong> θ2 = 7.5 ◦ relative to broadside.<br />
These directions correspond to the spatial (angular) frequencies:<br />
ωi = π s<strong>in</strong>(θi) (i = 1, 2). The signal covariance is<br />
P0 = 10 SNR/10<br />
�<br />
1 0.99e iπ/4<br />
0.99e −iπ/4 1<br />
where SNR is expressed <strong>in</strong> decibels (dB). The correlation phase of π/4 =<br />
45 ◦ is (close to) the “worst case,” that is, the case where the difference<br />
between CFB <strong>and</strong> CF is most significant (given the other parameters).<br />
The noise variance is fixed to σ 2 = 1.<br />
�<br />
,
2.7 Numerical Examples 57<br />
MSE<br />
10 2<br />
10 1<br />
10 1<br />
10 0<br />
10 2<br />
Number of snapshots, N.<br />
Figure 2.1: Mean square errors for θ1 (<strong>in</strong> degrees 2 ) for F-MODE (+)<br />
<strong>and</strong> FB-MODE (o) versus the number of snapshots N. The solid l<strong>in</strong>e is<br />
the theoretical (asymptotic) MSE for F-MODE as given <strong>in</strong> (2.31), <strong>and</strong><br />
the dashed l<strong>in</strong>e is the theoretical MSE for FB-MODE given <strong>in</strong> (2.58).<br />
In the first example, SNR=0 dB. The mean-square errors (MSEs)<br />
for F-MODE <strong>and</strong> FB-MODE are compared for different sample lengths<br />
<strong>in</strong> Figure 2.1. (<strong>On</strong>ly the MSE values for θ1 are shown; the MSE plot<br />
correspond<strong>in</strong>g to θ2 is similar.) The sample MSEs are based on 1000<br />
<strong>in</strong>dependent trials. In the simulations, two steps of the algorithm given<br />
<strong>in</strong> Section 2.4 were used both for F-MODE <strong>and</strong> FB-MODE. The MSE<br />
predicted by the large sample analysis is also depicted <strong>in</strong> Figure 2.1. It<br />
can be seen that the theoretical <strong>and</strong> simulation results are similar even<br />
for quite small sample lengths. It is clearly not useful to use FB-MODE<br />
<strong>in</strong> this case, not even for small samples.<br />
In the second example, the number of samples is fixed to N = 100<br />
<strong>and</strong> the SNR is varied. All other parameters are as before. The MSEs<br />
(computed from 1000 trials) are shown <strong>in</strong> Figure 2.2. We see that F-<br />
MODE outperforms FB-MODE for all SNR values.<br />
10 3
58 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />
MSE<br />
10 2<br />
10 1<br />
10 0<br />
10<br />
−6 −4 −2 0 2 4 6 8 10 12 14<br />
−1<br />
SNR <strong>in</strong> dB<br />
Figure 2.2: Mean square errors for θ1 (<strong>in</strong> degrees 2 ) for F-MODE (+) <strong>and</strong><br />
FB-MODE (o) versus the SNR. The solid l<strong>in</strong>e is the theoretical (asymptotic)<br />
MSE for F-MODE as given <strong>in</strong> (2.31), <strong>and</strong> the dashed l<strong>in</strong>e is the<br />
theoretical MSE for FB-MODE given <strong>in</strong> (2.58).<br />
2.8 Conclusions<br />
MODE is an angle-of-arrival estimator which is asymptotically statistically<br />
efficient. The st<strong>and</strong>ard MODE algorithm is used with the forwardonly<br />
sample covariance matrix of the output of a sensor array. Here, we<br />
have applied the MODE pr<strong>in</strong>ciple to the forward-backward sample covariance<br />
matrix to obta<strong>in</strong> the FB-MODE estimator. We have shown that<br />
FB-MODE simply consists of apply<strong>in</strong>g the st<strong>and</strong>ard MODE algorithm to<br />
the eigenelements of the forward-backward covariance matrix. We have<br />
also shown that the st<strong>and</strong>ard MODE algorithm outperforms FB-MODE,<br />
from a statistical st<strong>and</strong>po<strong>in</strong>t, <strong>in</strong> large samples. Numerical examples were<br />
presented to show that these results most often holds for small samples or
2.8 Conclusions 59<br />
low signal-to-noise ratios as well. Thus, the forward-backward averag<strong>in</strong>g<br />
should <strong>in</strong> general not be used <strong>in</strong> conjunction with MODE.
Chapter 3<br />
Weighted <strong>Subspace</strong><br />
Fitt<strong>in</strong>g for General Array<br />
Error Models<br />
Model error sensitivity is an issue common to all high resolution direction<br />
of arrival estimators. Much attention has been directed to the design of<br />
algorithms for m<strong>in</strong>imum variance estimation tak<strong>in</strong>g only f<strong>in</strong>ite sample<br />
errors <strong>in</strong>to account. Approaches to reduce the sensitivity due to array<br />
calibration errors have also appeared <strong>in</strong> the literature. Here<strong>in</strong>, one such<br />
approach is adopted which assumes that the errors due to f<strong>in</strong>ite samples<br />
<strong>and</strong> model errors are of comparable size. M<strong>in</strong>imum variance estimators<br />
have previously been proposed for this case, but these estimators typically<br />
lead to non-l<strong>in</strong>ear optimization problems <strong>and</strong> are not <strong>in</strong> general consistent<br />
if the source signals are fully correlated. In this chapter, a weighted<br />
subspace fitt<strong>in</strong>g method for general array perturbation models is derived.<br />
This method provides m<strong>in</strong>imum variance estimates under the assumption<br />
that the prior distribution of the perturbation model is known. Interest<strong>in</strong>gly,<br />
the method reduces to the WSF (MODE) estimator if no model<br />
errors are present. Vice versa, assum<strong>in</strong>g that model errors dom<strong>in</strong>ate, the<br />
method specializes to the correspond<strong>in</strong>g “model-errors-only subspace fitt<strong>in</strong>g<br />
method”. Unlike previous techniques for model errors, the estimator<br />
can be implemented us<strong>in</strong>g a two-step procedure if the nom<strong>in</strong>al array is<br />
uniform <strong>and</strong> l<strong>in</strong>ear, <strong>and</strong> it is also consistent even if the signals are fully<br />
correlated. The chapter also conta<strong>in</strong>s a large sample analysis of one of the
62 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
alternative methods, namely, MAPprox. It is shown that MAPprox also<br />
provides m<strong>in</strong>imum variance estimates under reasonable assumptions.<br />
3.1 Introduction<br />
All signal parameter estimation methods <strong>in</strong> array signal process<strong>in</strong>g rely<br />
on <strong>in</strong>formation about the array response, <strong>and</strong> assume that the signal<br />
wavefronts imp<strong>in</strong>g<strong>in</strong>g on the array have perfect spatial coherence (e.g.,<br />
perfect plane waves). The array response may be determ<strong>in</strong>ed by either<br />
empirical measurements, a process referred to as array calibration, or<br />
by mak<strong>in</strong>g certa<strong>in</strong> assumptions about the sensors <strong>in</strong> the array <strong>and</strong> their<br />
geometry (e.g., identical sensors <strong>in</strong> known locations).<br />
Unfortunately, an array cannot be perfectly calibrated, <strong>and</strong> analytically<br />
derived array responses rely<strong>in</strong>g on the array geometry <strong>and</strong> wave<br />
propagation are at best good approximations. Due to changes <strong>in</strong> antenna<br />
location, temperature, <strong>and</strong> the surround<strong>in</strong>g environment, the response<br />
of the array may be significantly different than when it was last<br />
calibrated. Furthermore, the calibration measurements themselves are<br />
subject to ga<strong>in</strong> <strong>and</strong> phase errors, <strong>and</strong> they can only be obta<strong>in</strong>ed for discrete<br />
angles, thus necessitat<strong>in</strong>g <strong>in</strong>terpolation techniques for uncalibrated<br />
directions. For the case of analytically calibrated arrays of nom<strong>in</strong>ally<br />
identical, identically oriented elements, errors result s<strong>in</strong>ce the elements<br />
are not really identical <strong>and</strong> their locations are not precisely known.<br />
Because of the many sources of error listed above, the limit<strong>in</strong>g factor<br />
<strong>in</strong> the performance of array signal process<strong>in</strong>g algorithms is most often<br />
not measurement noise, but rather perturbations <strong>in</strong> the array response<br />
model. Depend<strong>in</strong>g on the size of such errors, estimates of the directions<br />
of arrival (DOAs) <strong>and</strong> the source signals may be significantly degraded.<br />
A number of studies have been conducted to quantify the performance<br />
degradation due to model errors for both DOA [ZW88, WWN88, SK90,<br />
Fri90a, Fri90b, SK92, SK93, LV92, VS94b, SH92, KSS94, KSS96] <strong>and</strong><br />
signal estimation [RH80, Com82, Qua82, FW94, YS95]. These analyses<br />
have assumed either a determ<strong>in</strong>istic model for the errors, us<strong>in</strong>g bias as<br />
the performance metric, or a stochastic model, where variance is typically<br />
calculated.<br />
A number of techniques have also been considered for improv<strong>in</strong>g the<br />
robustness of array process<strong>in</strong>g algorithms. In one such approach, the array<br />
response is parameterized not only by the DOAs of the signals, but<br />
also by perturbation or “nuisance” parameters that describe deviations
3.1 Introduction 63<br />
of the response from its nom<strong>in</strong>al value. These parameters can <strong>in</strong>clude,<br />
for example, displacements of the antenna elements from their nom<strong>in</strong>al<br />
positions, uncalibrated receiver ga<strong>in</strong> <strong>and</strong> phase offsets, etc.. With such<br />
a model, a natural approach is to attempt to estimate the unknown nuisance<br />
parameters simultaneously with the signal parameters. Such methods<br />
are referred to as auto-calibration techniques, <strong>and</strong> have been proposed<br />
by a number of authors, <strong>in</strong>clud<strong>in</strong>g [PK85, RS87a, RS87b, WF89, WOV91,<br />
WRM94, VS94a, FFLC95, Swi96].<br />
When auto-calibration techniques are employed, it is critical to determ<strong>in</strong>e<br />
whether both the signal <strong>and</strong> nuisance parameters are identifiable.<br />
In certa<strong>in</strong> cases they are not; for example, one cannot uniquely estimate<br />
both DOAs <strong>and</strong> sensor positions unless of course additional <strong>in</strong>formation<br />
is available, such as sources <strong>in</strong> known locations [LM87, WF89, NN93,<br />
KF93], cyclostationary signals with two or more known cycle frequencies<br />
[YZ95], or partial <strong>in</strong>formation about the phase response of the array<br />
[MR94]. The identifiability problem can be alleviated if the perturbation<br />
parameters are assumed to be drawn from some known a priori distribution.<br />
While this itself represents a form of additional <strong>in</strong>formation, it<br />
has the advantage of allow<strong>in</strong>g an optimal maximum a posteriori (MAP)<br />
solution to the problem to be formulated [VS94a]. In [VS94a] it is shown<br />
that, by us<strong>in</strong>g an asymptotically equivalent approximation to the result<strong>in</strong>g<br />
MAP criterion, the estimation of the signal <strong>and</strong> nuisance parameters<br />
can be decoupled, lead<strong>in</strong>g to a significant simplification of the problem.<br />
Another technique for reduc<strong>in</strong>g the sensitivity of array process<strong>in</strong>g<br />
algorithms to model errors is through the use of statistically optimal<br />
weight<strong>in</strong>gs based on probabilistic models for the errors. Optimal weight<strong>in</strong>gs<br />
that ignore the f<strong>in</strong>ite sample effects of noise have been developed<br />
for MUSIC [SK92] <strong>and</strong> the general subspace fitt<strong>in</strong>g technique [SK93]. A<br />
more general analysis <strong>in</strong> [VS94b] <strong>in</strong>cluded the effects of noise, <strong>and</strong> presented<br />
a weighted subspace fitt<strong>in</strong>g (WSF) method whose DOA estimates<br />
asymptotically achieve the Cramér-Rao bound (CRB), but only for a very<br />
special type of array error model. <strong>On</strong> the other h<strong>and</strong>, the MAP approach<br />
<strong>in</strong> [VS94a] is asymptotically statistically efficient for very general error<br />
models. However, s<strong>in</strong>ce it is implemented by means of noise subspace<br />
fitt<strong>in</strong>g [OVSN93], if the sources are highly correlated or closely spaced <strong>in</strong><br />
angle, its f<strong>in</strong>ite sample performance may be poor. In fact, the method<br />
of [VS94a] is not a consistent estimator of the DOAs if the signals are<br />
perfectly coherent.<br />
In this chapter, we develop a statistically efficient weighted signal<br />
subspace fitt<strong>in</strong>g (SSF) algorithm that holds for very general array per-
64 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
turbation models, that has much better f<strong>in</strong>ite sample performance than<br />
[VS94a] when the signals are highly correlated or closely spaced, <strong>and</strong><br />
that yields consistent estimates when coherent sources are present. An<br />
additional advantage of our SSF formulation is that if the array is nom<strong>in</strong>ally<br />
uniform <strong>and</strong> l<strong>in</strong>ear, a two-step procedure similar to that for MODE<br />
[SS90a] can be used to elim<strong>in</strong>ate the search for the DOAs. Furthermore,<br />
as part of our analysis we show that the so-called MAPprox technique<br />
described <strong>in</strong> [WOV91] also asymptotically achieves the CRB, <strong>and</strong> we<br />
draw some connections between it <strong>and</strong> the new SSF algorithm presented<br />
here<strong>in</strong>.<br />
The weighted SSF method presented here differs from other WSF<br />
techniques like [VO91, SK93, VS94b] <strong>in</strong> that all of the elements of the<br />
vectors spann<strong>in</strong>g the signal subspace have their own <strong>in</strong>dividual weight<strong>in</strong>g<br />
[OAKP97]. In these earlier algorithms, the form of the weight<strong>in</strong>g forced<br />
certa<strong>in</strong> subsets of the elements to share a particular weight. It is the<br />
generalization to <strong>in</strong>dividual weights that is the key to the algorithm’s<br />
optimality for very general array response perturbation models. For this<br />
reason, we will refer to the algorithm as generalized weighted subspace<br />
fitt<strong>in</strong>g (GWSF).<br />
The chapter is organized as follows: In the next section, the data<br />
model is briefly <strong>in</strong>troduced. In Section 3.3, more <strong>in</strong>sight <strong>in</strong>to the studied<br />
problem is given by present<strong>in</strong>g previous approaches. In Section 3.4, the<br />
proposed GWSF estimator is <strong>in</strong>troduced, <strong>and</strong> its statistical properties are<br />
analyzed <strong>in</strong> Section 3.5. Details about the implementation of the algorithm<br />
are given <strong>in</strong> Section 3.6. In Section 3.7, we show that the MAPprox<br />
estimator has very close connections with GWSF <strong>and</strong> <strong>in</strong>herits the same<br />
asymptotic properties as GWSF. F<strong>in</strong>ally, the theoretical observations are<br />
illustrated by numerical examples <strong>in</strong> Section 3.8.<br />
3.2 Problem Formulation<br />
Assume that the output of an array of m sensors is given by the model<br />
x(t) = A(θ,ρ)s(t) + n(t),<br />
where s(t) is a complex d-vector conta<strong>in</strong><strong>in</strong>g the emitted signal waveforms<br />
<strong>and</strong> n(t) is an additive noise vector. The array steer<strong>in</strong>g matrix is def<strong>in</strong>ed<br />
as<br />
A(θ,ρ) = � ā(θ1,ρ) ... ā(θd,ρ) � ,
3.2 Problem Formulation 65<br />
where ā(θi,ρ) denotes the array response to a unit waveform associated<br />
with the signal parameter θi (possibly vector valued, although we will<br />
specialize to the scalar case). The parameters <strong>in</strong> the real n-vector ρ are<br />
used to model the uncerta<strong>in</strong>ty <strong>in</strong> the array steer<strong>in</strong>g matrix. It is assumed<br />
that A(θ,ρ) is known for a nom<strong>in</strong>al value ρ = ρ 0 <strong>and</strong> that the columns<br />
<strong>in</strong> A(θ,ρ 0) are l<strong>in</strong>early <strong>in</strong>dependent as long as θi �= θj, i �= j. The model<br />
for A(θ,ρ 0) can be obta<strong>in</strong>ed for example by physical <strong>in</strong>sight, or it could<br />
be the result of a calibration experiment. <strong>On</strong>e may then have knowledge<br />
about the sensitivity of the nom<strong>in</strong>al response to certa<strong>in</strong> variations <strong>in</strong> ρ,<br />
which can be modeled by consider<strong>in</strong>g ρ as a r<strong>and</strong>om vector. The a priori<br />
<strong>in</strong>formation is then <strong>in</strong> the form of a probability distribution which can<br />
be used <strong>in</strong> the design of an estimation algorithm to make it robust with<br />
regard to the model errors.<br />
We will use a stochastic model for the signals; more precisely, s(t) is<br />
considered to be a zero-mean Gaussian r<strong>and</strong>om vector with second order<br />
moments<br />
E{s(t)s ∗ (s)} = Pδt,s,<br />
E{s(t)s T (s)} = 0,<br />
where (·) ∗ denotes complex conjugate transpose, (·) T denotes transpose<br />
<strong>and</strong> δt,s is the Kronecker delta. Let d ′ denote the rank of the signal<br />
covariance matrix P. Special attention will be given to cases where d ′ <<br />
d, i.e., cases <strong>in</strong> which the signals may be fully correlated (coherent). In<br />
particular, we show that the proposed method is consistent even under<br />
such severe conditions. The noise is modeled as a zero-mean spatially <strong>and</strong><br />
temporally white complex Gaussian vector with second order moments<br />
E{n(t)n ∗ (s)} = σ 2 Iδt,s,<br />
E{n(t)n T (s)} = 0.<br />
The perturbation parameter vector ρ is modeled as a Gaussian r<strong>and</strong>om<br />
variable with mean E{ρ} = ρ 0 <strong>and</strong> covariance<br />
E{(ρ − ρ 0)(ρ − ρ 0) T } = Ω.<br />
It is assumed that both ρ 0 <strong>and</strong> Ω are known. Similar to [VS94b, VS94a],<br />
we consider small perturbations <strong>in</strong> ρ <strong>and</strong> more specifically assume that<br />
the effect of the array errors on the estimates of θ is of comparable size<br />
to those due to the f<strong>in</strong>ite sample effects of the noise. To model this, we
66 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
assume that<br />
Ω = ¯ Ω/N,<br />
where N is the number of samples <strong>and</strong> ¯ Ω is <strong>in</strong>dependent of N. This<br />
somewhat artificial assumption concern<strong>in</strong>g the model errors is made only<br />
for convenience <strong>in</strong> show<strong>in</strong>g the statistical optimality of the algorithm<br />
presented later. An identical result can be obta<strong>in</strong>ed for small ρ − ρ 0 if a<br />
first order perturbation analysis is used, but this approach is somewhat<br />
less elegant. The assumption that ρ − ρ 0 = Op(1/ √ N) implies that<br />
the calibration errors <strong>and</strong> f<strong>in</strong>ite sample effects are of comparable size.<br />
(The notation ξ = Op(1/ √ N) means that √ Nξ tends to a constant <strong>in</strong><br />
probability as N → ∞.) Despite this fact, we will see that the result<strong>in</strong>g<br />
algorithm is optimal also for the follow<strong>in</strong>g limit<strong>in</strong>g cases:<br />
• N → ∞ for fixed (ρ − ρ 0) – (model errors only),<br />
• (ρ − ρ 0) → 0 for fixed N – (f<strong>in</strong>ite sample errors only).<br />
In other words, for each of the above cases, the proposed technique converges<br />
to an algorithm that has previously been shown to be statistically<br />
efficient for that case. We can thus conclude that the algorithm is globally<br />
optimal, not just under the assumption ρ − ρ 0 = Op(1/ √ N).<br />
3.3 Robust Estimation<br />
This section <strong>in</strong>troduces some of the previous approaches taken to solve<br />
the problem considered <strong>in</strong> this chapter. The Cramér Rao lower bound is<br />
also restated from [VS94a].<br />
3.3.1 MAP<br />
As the problem has been formulated, it is natural to use a maximum<br />
a posteriori (MAP) estimator. Follow<strong>in</strong>g [WOV91], the cost function<br />
to be m<strong>in</strong>imized for simultaneous estimation of signal <strong>and</strong> perturbation<br />
parameters is<br />
VMAP(θ,ρ) = VML(θ,ρ) + 1<br />
2 (ρ − ρ 0) T Ω −1 (ρ − ρ 0).<br />
VML(θ,ρ) is the concentrated negative log-likelihood function <strong>and</strong> is given<br />
by (see, e.g., [Böh86, Jaf88])<br />
VML(θ,ρ) = N log det{A ˆ PA ∗ + ˆσ 2 I},
3.3 Robust Estimation 67<br />
where for brevity we have suppressed the arguments <strong>and</strong> where constant<br />
terms have been neglected. Here, ˆ P = ˆ P(θ,ρ) <strong>and</strong> ˆσ 2 = ˆσ 2 (θ,ρ) denote<br />
the maximum likelihood estimates of P <strong>and</strong> σ 2 , respectively. They are<br />
explicitly given by:<br />
where<br />
ˆσ 2 (θ,ρ) = 1<br />
m − d Tr{Π⊥A(θ,ρ) ˆ R},<br />
ˆP(θ,ρ) = A † � �<br />
(θ,ρ) ˆR 2<br />
− ˆσ (θ,ρ)I A †∗ (θ,ρ), (3.1)<br />
ˆR = 1<br />
N<br />
N�<br />
x(t)x ∗ (t),<br />
t=1<br />
A † = (A ∗ A) −1 A ∗ ,<br />
Π ⊥ A = I − AA † .<br />
For large N, it is further argued <strong>in</strong> [WOV91] that VML(θ,ρ) can<br />
be replaced with any other cost function that has the same large sample<br />
behavior as ML. In particular, it was proposed that the weighted subspace<br />
fitt<strong>in</strong>g (WSF) cost function [VO91] be used. This results <strong>in</strong><br />
where<br />
VMAP-WSF(θ,ρ) = N<br />
ˆσ 2 VWSF(θ,ρ) + 1<br />
2 (ρ − ρ 0) T Ω −1 (ρ − ρ 0), (3.2)<br />
VWSF(θ,ρ) = Tr{Π ⊥ A(θ,ρ) Ês ˆ WWSF Ê∗ s}. (3.3)<br />
Here, Ês <strong>and</strong> ˆ WWSF are def<strong>in</strong>ed via the eigendecomposition of ˆ R<br />
ˆR = Ês ˆ Λs Ê∗ s + Ên ˆ Λn Ê∗ n, (3.4)<br />
where ˆ Λs is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the d ′ largest eigenvalues <strong>and</strong><br />
Ês is composed of the correspond<strong>in</strong>g eigenvectors. The WSF weight<strong>in</strong>g<br />
matrix is def<strong>in</strong>ed as<br />
ˆWWSF = ˆ � Λ 2<br />
ˆΛ −1<br />
s ,<br />
where ˆ � Λ = ˆ Λs − ˆσ 2 I <strong>and</strong> ˆσ 2 is a consistent estimate of σ 2 (for example,<br />
the average of the m − d ′ smallest eigenvalues of ˆ R). The normaliz<strong>in</strong>g<br />
factor N/ˆσ 2 <strong>in</strong> (3.2) is important to make the partial derivatives of
68 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
N VWSF/ˆσ 2 <strong>and</strong> VML with respect to θ <strong>and</strong> ρ of comparable size locally<br />
around {θ0,ρ}. More precisely,<br />
∂VML<br />
∂η<br />
�<br />
�<br />
�<br />
�<br />
θ0,ρ<br />
= N<br />
ˆσ 2<br />
∂VWSF<br />
∂η<br />
�<br />
�<br />
�<br />
� + op(<br />
θ0,ρ<br />
√ N),<br />
where η denotes a generic component of the vector [θ T ρ T ] T , <strong>and</strong> where<br />
op(·) represents a term that tends to zero faster than its argument <strong>in</strong><br />
probability. The m<strong>in</strong>imization of either VMAP or VMAP-WSF is over both<br />
θ <strong>and</strong> ρ. The idea of the methods presented <strong>in</strong> the follow<strong>in</strong>g two sections<br />
is to elim<strong>in</strong>ate the need for an explicit m<strong>in</strong>imization over ρ.<br />
3.3.2 MAPprox<br />
In [WOV91] it is suggested that the m<strong>in</strong>imization of VMAP-WSF over ρ can<br />
be done implicitly. Assum<strong>in</strong>g that ρ − ρ 0 is small, a second order Taylor<br />
expansion of VMAP-WSF around ρ 0 is performed, <strong>and</strong> then the result is<br />
analytically m<strong>in</strong>imized with regard to ρ. This leads to a concentrated<br />
form of the cost function<br />
VMAPprox(θ)<br />
= N<br />
ˆσ 2<br />
�<br />
VWSF − 1<br />
2 ∂ρV T WSF<br />
�<br />
∂ρρVWSF + ˆσ2<br />
N Ω−1<br />
�−1<br />
∂ρVWSF<br />
�<br />
, (3.5)<br />
where ∂ρVWSF denotes the gradient of VWSF with respect to ρ <strong>and</strong><br />
∂ρρVWSF denotes the Hessian. The right h<strong>and</strong> side of (3.5) is evaluated<br />
at ρ 0 <strong>and</strong> θ. The method is referred to as MAPprox. In Section 3.7 we<br />
analyze the asymptotic properties of the MAPprox estimates <strong>and</strong> po<strong>in</strong>t<br />
out its relationship to the GWSF estimator derived <strong>in</strong> Section 3.4.<br />
3.3.3 MAP-NSF<br />
Another method for approximat<strong>in</strong>g the MAP estimator is the MAP-NSF<br />
approach of [VS94a]. In MAP-NSF, one uses the fact that the noise<br />
subspace fitt<strong>in</strong>g (NSF) method is also asymptotically equivalent to ML<br />
[OVSN93]. Thus, <strong>in</strong> lieu of us<strong>in</strong>g the WSF criterion <strong>in</strong> (3.2), the NSF<br />
cost function<br />
VNSF = Tr{A ∗ ÊnÊ∗nA Û} (3.6)
3.3 Robust Estimation 69<br />
is employed. Here, Û is a consistent estimate of the matrix<br />
U = A † Es � Λ 2<br />
Λ −1<br />
s E ∗ sA †∗<br />
.<br />
As shown below, this also leads to an explicit solution for ρ when ρ − ρ 0<br />
is small.<br />
The NSF cost function can be rewritten us<strong>in</strong>g the follow<strong>in</strong>g general<br />
identities:<br />
Tr{AB} = vec T (A T )vec(B) (3.7)<br />
vec(ABC) = � C T ⊗ A � vec(B), (3.8)<br />
where vec(·) is the vectorization operation <strong>and</strong> ⊗ denotes the Kronecker<br />
matrix product [Bre78, Gra81]. An equivalent form of (3.6) is then given<br />
by<br />
where<br />
VNSF = a ∗ ˆ Ma,<br />
a = vec(A),<br />
ˆM = ÛT ⊗ ( Ên Ê∗ n). (3.9)<br />
The next idea is to approximate the steer<strong>in</strong>g matrix around ρ 0. A firstorder<br />
Taylor expansion gives<br />
where<br />
a(θ,ρ) ≈ a0 + Dρ(ρ − ρ 0), (3.10)<br />
a0 = a(θ,ρ 0),<br />
Dρ =<br />
� ∂a(θ,ρ)<br />
∂ρ 1<br />
...<br />
��<br />
∂a(θ,ρ) ��θ,ρ0<br />
∂ρ . (3.11)<br />
n<br />
S<strong>in</strong>ce the derivatives of both a <strong>and</strong> the approximation <strong>in</strong> (3.10) with respect<br />
to any parameter <strong>in</strong> θ or ρ are identical when evaluated at ρ 0, it<br />
can be shown that this approximation will not affect the asymptotic properties<br />
of the estimates. Insert<strong>in</strong>g (3.10) <strong>in</strong> (3.6) <strong>and</strong> us<strong>in</strong>g the result<strong>in</strong>g<br />
cost function to replace VWSF <strong>in</strong> (3.2) leads to<br />
N<br />
ˆσ 2 (a0 + Dρ˜ρ) ∗ ˆ M(a0 + Dρ˜ρ) + 1<br />
2 ˜ρT Ω −1 ˜ρ, (3.12)
70 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
where ˜ρ = ρ − ρ 0. This cost function is quadratic <strong>in</strong> ρ <strong>and</strong> can thus,<br />
for fixed θ, be explicitly m<strong>in</strong>imized with respect to ρ. Substitut<strong>in</strong>g the<br />
solution for ρ back <strong>in</strong>to (3.12) gives<br />
where<br />
VMAP-NSF(θ) = a ∗ 0 ˆ Ma0 − ˆ f T ˆ Γ −1 ˆf, (3.13)<br />
ˆ f = Re{D ∗ ρ ˆ Ma0},<br />
ˆΓ = Re{D ∗ ρ ˆ MDρ + ˆσ2<br />
2 ¯ Ω −1 }. (3.14)<br />
The function <strong>in</strong> (3.13) depends on θ via a0 <strong>and</strong> Dρ. It is noted <strong>in</strong><br />
[VS94a] that Dρ (<strong>and</strong> ˆ M) can be replaced with a consistent estimate<br />
without affect<strong>in</strong>g the large sample second order properties.<br />
The follow<strong>in</strong>g procedure is suggested by the derivations above:<br />
1. Compute the eigendecomposition of ˆ R. Obta<strong>in</strong> a consistent estimate<br />
of θ us<strong>in</strong>g, for example, MUSIC.<br />
2. Based on the estimate from the first step, compute consistent estimates<br />
of Dρ <strong>and</strong> ˆ M. Insert these estimates <strong>in</strong> the cost function<br />
(3.13), which thus depends on θ only via a0. Take the m<strong>in</strong>imiz<strong>in</strong>g<br />
argument of the result<strong>in</strong>g cost function as the MAP-NSF estimate<br />
of θ.<br />
In [VS94a], the statistical properties of the MAP-NSF estimate of θ are<br />
analyzed. It is shown that MAP-NSF is a consistent <strong>and</strong> asymptotically<br />
efficient estimator if d ′ = d. This means that the asymptotic covariance<br />
matrix of the estimation error <strong>in</strong> θ is equal to the Cramér-Rao lower<br />
bound given <strong>in</strong> the next section.<br />
3.3.4 Cramér Rao Lower Bound<br />
In [WOV91, VS94a] an asymptotically valid Cramér Rao lower bound is<br />
derived for the problem of <strong>in</strong>terest here<strong>in</strong>. Below we give the lower bound<br />
on the signal parameters only.<br />
Theorem 3.1. Assum<strong>in</strong>g that ˆ θ is an asymptotically unbiased estimate<br />
of θ0, then for large N,<br />
E{( ˆ θ − θ0)( ˆ θ − θ0) T } ≥ CRBθ � σ2 � T<br />
C − F<br />
2N<br />
θΓ −1 �−1<br />
Fθ , (3.15)
3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) 71<br />
where 1<br />
C = Re{D ∗ θMD θ},<br />
M = U T ⊗ Π ⊥ A, (3.16)<br />
� �<br />
∂a(θ,ρ) ∂a(θ,ρ)<br />
Dθ = ... ,<br />
∂θ1<br />
∂θd<br />
Fθ = Re{D ∗ ρMD θ}, (3.17)<br />
Γ = Re{D ∗ ρMDρ + σ2<br />
2 ¯ Ω −1 }. (3.18)<br />
The above expressions are evaluated at θ0 <strong>and</strong> ρ 0.<br />
Proof. See [WOV91, VS94a].<br />
It is important to notice that the CRB for the case with no calibration<br />
errors is σ 2 C −1 /2N <strong>and</strong> clearly is a lower bound for the MAP-CRB.<br />
3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g<br />
(GWSF)<br />
It was shown <strong>in</strong> [VS94a] that the MAP-NSF method is asymptotically<br />
statistically efficient. <strong>On</strong>e drawback with MAP-NSF is that it is not<br />
consistent if the sources are fully correlated, that is, if d ′ < d. This<br />
problem can be overcome <strong>in</strong> the signal subspace formulation. In this<br />
section, we derive a new method <strong>in</strong> the signal subspace fitt<strong>in</strong>g (SSF) class<br />
of algorithms. The method will be shown to possess the same large sample<br />
performance as MAP-NSF <strong>and</strong> is thus also asymptotically efficient. A<br />
further advantage of the proposed SSF approach is that the cost function<br />
depends on the parameters <strong>in</strong> a relatively simple manner <strong>and</strong>, if the<br />
nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear, the non-l<strong>in</strong>ear m<strong>in</strong>imization can be<br />
replaced by a root<strong>in</strong>g technique (see Section 3.6).<br />
Accord<strong>in</strong>g to the problem formulation, ˆ R − R = Op(1/ √ N), where<br />
R = E{x(t)x ∗ (t)} = A0PA ∗ 0 + σ 2 I<br />
<strong>and</strong> A0 = A(θ0,ρ 0). This implies that Ês → Es = A0T at the same<br />
rate, where T is a d × d ′ full rank matrix. The idea of the proposed<br />
method is to m<strong>in</strong>imize a suitable norm of the residuals<br />
ε = vec(B ∗ (θ) Ês),<br />
1 Notice that M is the limit of ˆ M def<strong>in</strong>ed <strong>in</strong> (3.9) if d ′ = d.
72 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
where B(θ) is an m × (m − d) full rank matrix whose columns span the<br />
null-space of A ∗ (θ). This implies that B ∗ (θ)A(θ) = 0 <strong>and</strong> B ∗ (θ0)Es =<br />
0.<br />
Remark 3.1. For general arrays, there is no known closed form parameterization<br />
of B <strong>in</strong> terms of θ. However, this will not <strong>in</strong>troduce any problems<br />
s<strong>in</strong>ce, as we show <strong>in</strong> Section 3.6, the f<strong>in</strong>al criterion can be written<br />
as an explicit function of θ.<br />
For general array error models, we have to consider the real <strong>and</strong> imag<strong>in</strong>ary<br />
parts of ε separately to get optimal (m<strong>in</strong>imum variance) performance.<br />
Equivalently, we may study ε <strong>and</strong> its complex conjugate, which<br />
we denote by ε c (cf. Section 2.3.5). We believe that the analysis of the<br />
method is clearer if ε <strong>and</strong> ε c are used <strong>in</strong>stead of the real <strong>and</strong> imag<strong>in</strong>ary<br />
parts. However, from an implementational po<strong>in</strong>t of view, it may be more<br />
suitable to use real arithmetic. Before present<strong>in</strong>g the estimation criterion,<br />
we def<strong>in</strong>e:<br />
<strong>and</strong><br />
�<br />
vec(Ês)<br />
ês =<br />
vec( Êc �<br />
,<br />
s)<br />
�<br />
(Id ′ ⊗ A)<br />
Ā =<br />
0<br />
0<br />
(Id ′ ⊗ Ac �<br />
,<br />
)<br />
�<br />
¯B =<br />
,<br />
�<br />
(Id ′ ⊗ B) 0<br />
0 (Id ′ ⊗ Bc )<br />
¯ε =<br />
�<br />
ε<br />
εc �<br />
= ¯ B ∗ ês.<br />
Notice that the columns of ¯ B span the null-space of Ā∗ .<br />
We propose to estimate the signal parameters θ as follows:<br />
ˆθ = arg m<strong>in</strong><br />
θ VGWSF(θ), (3.19)<br />
VGWSF(θ) = ¯ε ∗ (θ)W¯ε(θ), (3.20)<br />
where W is a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix. The method will <strong>in</strong> the<br />
sequel be referred to as the generalized weighted subspace fitt<strong>in</strong>g (GWSF)<br />
method. To derive the weight<strong>in</strong>g that leads to m<strong>in</strong>imum variance estimates<br />
of θ, we need to compute the residual covariance matrix. As shown<br />
<strong>in</strong> Appendix A.2, the (asymptotic) second order moment of the residual
3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) 73<br />
vector ¯ε at θ0<br />
can be written as<br />
C¯ε = lim<br />
N→∞ N E{¯ε¯ε∗ } (3.21)<br />
C¯ε = ¯ L + ¯ G ¯ G ∗ , (3.22)<br />
where we have def<strong>in</strong>ed<br />
�<br />
¯L<br />
L 0<br />
=<br />
0 Lc �<br />
, (3.23)<br />
�<br />
L = σ 2 −2<br />
Λ� Λs ⊗ B ∗ �<br />
B ,<br />
�<br />
¯G<br />
G<br />
=<br />
Gc �<br />
, (3.24)<br />
G = � T T ⊗ B ∗� Dρ ¯ Ω 1/2 .<br />
Here, ¯ Ω 1/2 is a (symmetric) square root of ¯ Ω <strong>and</strong> T = A †<br />
0 Es. It is easy<br />
to see that C¯ε is positive def<strong>in</strong>ite s<strong>in</strong>ce L is. It is then well known that<br />
(<strong>in</strong> the class of estimators based on ¯ε) the optimal choice of the weight<strong>in</strong>g<br />
<strong>in</strong> terms of m<strong>in</strong>imiz<strong>in</strong>g the parameter estimation error variance is<br />
WGWSF = C −1<br />
¯ε<br />
(3.25)<br />
(see, e.g., Section 2.3.5 <strong>and</strong> [SS89]). In the next section it is proven<br />
that this weight<strong>in</strong>g <strong>in</strong> fact also gives asymptotically (<strong>in</strong> N) m<strong>in</strong>imum<br />
variance estimates (<strong>in</strong> the class of unbiased estimators) for the estimation<br />
problem under consideration. The implementation of GWSF is discussed<br />
<strong>in</strong> Section 3.6. We conclude this section with two remarks regard<strong>in</strong>g the<br />
GWSF formulation.<br />
Remark 3.2. If there are no model errors, then GWSF reduces to the<br />
WSF estimator (3.3). This can easily be verified by sett<strong>in</strong>g ¯ Ω = 0 <strong>and</strong><br />
rewrit<strong>in</strong>g the GWSF criterion (3.20).<br />
Remark 3.3. For the case of model errors only (N → ∞ or σ 2 → 0),<br />
GWSF becomes the “model errors only” algorithm [SK93]. The results<br />
obta<strong>in</strong>ed for GWSF are also consistent with weight<strong>in</strong>gs given <strong>in</strong> [VS94b]<br />
for special array error models.<br />
Thus, the GWSF method can be considered to be optimal <strong>in</strong> general,<br />
<strong>and</strong> not just when the model errors <strong>and</strong> the f<strong>in</strong>ite sample effects are of<br />
the same order.
74 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
3.5 Performance Analysis<br />
The asymptotic properties of the GWSF estimates are analyzed <strong>in</strong> this<br />
section. It is shown that the estimates are consistent <strong>and</strong> have a limit<strong>in</strong>g<br />
Gaussian distribution. It is also shown that the asymptotic covariance<br />
matrix of the estimation error is equal to the CRB.<br />
3.5.1 Consistency<br />
The cost function (3.20) converges uniformly <strong>in</strong> θ with probability one<br />
(w.p.1) to the limit<br />
¯VGWSF(θ) = � vec∗ (B∗ (θ)Es) vecT (B∗ (θ)Es) � � �<br />
∗ vec(B (θ)Es)<br />
W<br />
.<br />
vec c (B ∗ (θ)Es)<br />
(3.26)<br />
This implies that ˆ θ converges to the m<strong>in</strong>imiz<strong>in</strong>g argument of (3.26).<br />
S<strong>in</strong>ce W is positive def<strong>in</strong>ite, ¯ VGWSF(θ) ≥ 0 with equality if <strong>and</strong> only<br />
if B ∗ (θ)Es = 0. However, it is proven by [WZ89] that this uniquely<br />
determ<strong>in</strong>es θ0 if d < (m + d ′ )/2. We have thus proven that ˆ θ <strong>in</strong> (3.19)<br />
tends to θ0 w.p.1 as N → ∞.<br />
3.5.2 Asymptotic Distribution<br />
In this section we derive the asymptotic distribution of the GWSF estimates<br />
(3.19). As already mentioned, there exists a certa<strong>in</strong> weight<strong>in</strong>g<br />
matrix (3.25) that m<strong>in</strong>imizes the estimation error variance. This matrix<br />
depends on the true parameters <strong>and</strong>, <strong>in</strong> practice, has to be replaced with<br />
a consistent estimate. However, this will not change the second order<br />
asymptotic properties considered here<strong>in</strong>. In fact it is easy to see below<br />
that the derivative of the cost function with respect to W only leads to<br />
higher order terms. Thus, whether W is considered as a fixed matrix<br />
or a consistent estimate thereof does not matter for the analysis <strong>in</strong> this<br />
section.<br />
The estimate that m<strong>in</strong>imizes (3.20) satisfies by def<strong>in</strong>ition V ′ ( ˆ θ) = 0,<br />
<strong>and</strong> s<strong>in</strong>ce ˆ θ is consistent, a Taylor series expansion of V ′ ( ˆ θ) around θ0<br />
leads to<br />
˜θ = −H −1 V ′ + op(1/ √ N), (3.27)<br />
where ˜ θ = ˆ θ − θ0, H is the limit<strong>in</strong>g Hessian, <strong>and</strong> V ′ is the gradient of<br />
V (θ). Both V ′ <strong>and</strong> H are evaluated at θ0. In the follow<strong>in</strong>g, let the <strong>in</strong>dex
3.5 Performance Analysis 75<br />
i denote the partial derivative with respect to θi. The derivative of V (θ)<br />
is given by<br />
V ′<br />
i = ¯ε ∗ iW¯ε + ¯ε ∗ W¯εi<br />
= 2Re {¯ε ∗ iW¯ε} .<br />
(3.28)<br />
It is straightforward to verify that ¯ε ∗ iW¯ε is real for any weight<strong>in</strong>g matrix<br />
hav<strong>in</strong>g the form<br />
W =<br />
� W11 W12<br />
W c 12 W c 11<br />
�<br />
. (3.29)<br />
In particular, the matrix WGWSF <strong>in</strong> (3.25) has this structure. Next notice<br />
that ¯ε = ¯ B ∗ ês <strong>and</strong> thus<br />
¯εi = ¯ B ∗ i ês = ¯ B ∗ ies + op(1), (3.30)<br />
where es denotes the limit of ês. Def<strong>in</strong>e the vector<br />
ki = ¯ B ∗ � �<br />
∗ vec(B<br />
ies = iEs) ,<br />
<strong>and</strong> <strong>in</strong>sert (3.30) <strong>in</strong> (3.28) to show that<br />
vec(B∗ c<br />
iEs) V ′<br />
i ≃ 2k ∗ iW¯ε, (3.31)<br />
for any W satisfy<strong>in</strong>g (3.29), where ≃ represents an equality up to first<br />
order; that is, terms of order op(1/ √ N) are omitted. Recall the relation<br />
(A.5) <strong>and</strong> that √ N( ˆ R − R) is asymptotically Gaussian by the central<br />
limit theorem. By (3.27) <strong>and</strong> (3.31) we thus conclude that the normalized<br />
estimation error has a limit<strong>in</strong>g Gaussian distribution<br />
√ N ˜ θ ∈ AsN(0,H −1 QH −1 ),<br />
where<br />
Q = lim<br />
N→∞ N E{V ′ V ′T }, (3.32)<br />
<strong>and</strong> where the notation AsN means asymptotically Gaussian distributed.<br />
In summary we have the follow<strong>in</strong>g result.<br />
Theorem 3.2. If d < (m+d ′ )/2, then the GWSF estimate (3.19) tends<br />
to θ0 w.p.1 as N → ∞, <strong>and</strong><br />
√ N( ˆ θ − θ0) ∈ AsN(0,H −1 QH −1 ).
76 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
If W = WGWSF given <strong>in</strong> (3.25), then<br />
H −1 QH −1 = N CRBθ.<br />
Expressions for H <strong>and</strong> Q can be found <strong>in</strong> Appendix A.3 <strong>and</strong> CRBθ is<br />
given <strong>in</strong> (3.15). All quantities are evaluated at θ0.<br />
Proof. The result follows from the analysis above <strong>and</strong> the calculations <strong>in</strong><br />
Appendix A.3.<br />
This result is similar to what was derived for the MAP-NSF method<br />
<strong>in</strong> [VS94a]. It implies that GWSF <strong>and</strong> MAP-NSF are asymptotically<br />
equivalent <strong>and</strong> efficient for large N <strong>and</strong> small Ω (= ¯ Ω/N). An advantage<br />
of GWSF as compared to MAP-NSF is that GWSF is consistent even if<br />
P is rank deficient (d ′ < d). Another advantage is that the m<strong>in</strong>imization<br />
of VGWSF(θ) can be performed without a search if the nom<strong>in</strong>al array<br />
is uniform <strong>and</strong> l<strong>in</strong>ear, as we show <strong>in</strong> Section 3.6. In short, GWSF is<br />
a generalization of WSF [VO91] <strong>and</strong> MODE [SS90a] that allows one to<br />
<strong>in</strong>clude a priori knowledge of the array perturbations <strong>in</strong>to the estimation<br />
criterion.<br />
3.6 Implementation<br />
We will <strong>in</strong> this section discuss <strong>in</strong> some detail the implementation of the<br />
GWSF estimator. Recall that the GWSF criterion is written <strong>in</strong> terms<br />
of the matrix ¯ B(θ) <strong>and</strong> that the parameterization of ¯ B <strong>in</strong> general is<br />
not known. In this section we first rewrite the GWSF criterion so that<br />
it becomes explicitly parameterized by θ. This function can then be<br />
m<strong>in</strong>imized numerically to give the GWSF estimate of θ. Next, <strong>in</strong> Section<br />
3.6.2, we present the implementation for the common special case<br />
when the nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear. For this case, it is possible<br />
to implement GWSF without the need for gradient search methods.<br />
Thus, GWSF is an <strong>in</strong>terest<strong>in</strong>g direction of arrival estimator which comb<strong>in</strong>es<br />
statistical efficiency with computational simplicity (relative to other<br />
efficient estimators).<br />
3.6.1 Implementation for General Arrays<br />
The GWSF cost function <strong>in</strong> (3.20) reads<br />
VGWSF(θ) = ¯ε ∗ (θ)W¯ε(θ) = ê ∗ s ¯ B(θ)W ¯ B ∗ (θ)ês. (3.33)
3.6 Implementation 77<br />
What complicates matters is that the functional form of ¯ B(θ) is not<br />
known for general array geometries. However, if the optimal weight<strong>in</strong>g<br />
<strong>in</strong> (3.25) is used <strong>in</strong> (3.33), then it is possible to rewrite (3.33) so that it<br />
depends explicitly on θ. To see how this can be done, we will use the<br />
follow<strong>in</strong>g two lemmas.<br />
Lemma 3.1. Let X <strong>and</strong> Y be two full column rank matrices of dimension<br />
m × n <strong>and</strong> m × (m − n), respectively. If X∗Y = 0 then<br />
�<br />
� � †<br />
−1 X<br />
X Y =<br />
Y †<br />
�<br />
,<br />
where (·) † denotes the Moore-Penrose pseudo-<strong>in</strong>verse of the matrix <strong>in</strong><br />
question (X † = (X ∗ X) −1 X ∗ ).<br />
Proof. By direct verification we have<br />
� �<br />
X Y � X †<br />
Y †<br />
� �<br />
† X<br />
=<br />
Y †<br />
�<br />
�X �<br />
Y = Im.<br />
Lemma 3.2. Let X (m × n) <strong>and</strong> Y (m × (m − n)) be two full column<br />
rank matrices. Furthermore, assume that X ∗ Y = 0, <strong>and</strong> let C <strong>and</strong> D be<br />
two non-s<strong>in</strong>gular matrices. Then<br />
XC −1 X ∗ = ΠX<br />
�<br />
X †∗<br />
CX † + Y †∗<br />
DY †� −1<br />
ΠX,<br />
where ΠX = X(X ∗ X) −1 X ∗ is the projection matrix project<strong>in</strong>g orthogonally<br />
onto the range-space of X.<br />
Proof. Note that ΠXY = 0. The lemma is proven by the follow<strong>in</strong>g set<br />
of equalities:<br />
XC −1 X ∗ = ΠXXC −1 X ∗ ΠX<br />
� �<br />
= ΠX X Y � C−1 0<br />
0 D−1 � �<br />
∗ X<br />
Y∗ �<br />
��X ∗<br />
= ΠX<br />
Y∗ �−1 � �<br />
C 0 �X �−1 Y<br />
0 D<br />
�<br />
�X<br />
= ΠX<br />
† ∗<br />
Y †∗� � � �<br />
† C 0 X<br />
0 D Y †<br />
�<br />
= ΠX X †∗<br />
CX † + Y †∗<br />
DY †� −1<br />
ΠX,<br />
ΠX<br />
� −1<br />
�� −1<br />
ΠX<br />
ΠX
78 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
where use was made of Lemma 3.1 <strong>in</strong> the fourth step.<br />
Now apply Lemma 3.2 to the GWSF criterion (3.33). With reference<br />
to Lemma 3.2, take X = ¯ B, Y = Ā, C = W−1 <strong>and</strong>, for example,<br />
D = ( Ā∗ Ā) 2 to show that<br />
where<br />
VGWSF(θ) = ê ∗ sΠ ⊥Ā ˜ WΠ ⊥Ā ês, (3.34)<br />
˜W =<br />
� ¯B † ∗<br />
W −1 ¯ B † + Ā Ā∗� −1<br />
. (3.35)<br />
⊥Ā In (3.34), we used the fact that Π¯ B = Π . Let W be chosen accord<strong>in</strong>g<br />
to (3.25). It is then readily verified that<br />
where<br />
¯B †∗<br />
W −1 B ¯ †<br />
= B ¯ † ∗<br />
C¯ε ¯ B † � �<br />
˜L 0<br />
=<br />
0 L˜ c +<br />
� �<br />
˜G �<br />
˜G ∗ G˜ T � , (3.36)<br />
˜G c<br />
�<br />
˜L = σ 2 −2<br />
Λ� Λs ⊗ Π ⊥ �<br />
A ,<br />
�<br />
˜G = T T ⊗ Π ⊥ �<br />
A Dρ ¯ Ω 1/2 .<br />
Comb<strong>in</strong><strong>in</strong>g (3.36), (3.35) <strong>and</strong> (3.34), we see that the criterion is now<br />
explicitly parameterized by θ. The GWSF algorithm for general array<br />
geometries can be summarized as follows:<br />
1. Based on ˆ R, compute Ês, ˆ Λs, <strong>and</strong><br />
ˆσ 2 1<br />
=<br />
m − d ′<br />
�<br />
Tr( ˆ R) − Tr( ˆ �<br />
Λs) ,<br />
ˆ�Λ = ˆ Λs − ˆσ 2 I.<br />
2. The GWSF estimate of θ is then given by<br />
ˆθ = arg m<strong>in</strong>ê<br />
θ ∗ sΠ ⊥Ā<br />
ˆ˜WΠ<br />
⊥Ā ês,<br />
where ˆ˜ W is given by (3.35) <strong>and</strong> (3.36) with the sample estimates<br />
from the first step <strong>in</strong>serted <strong>in</strong> place of the true quantities.
3.6 Implementation 79<br />
This results <strong>in</strong> a non-l<strong>in</strong>ear optimization problem where θ enters <strong>in</strong> a<br />
complicated way. From the analysis <strong>in</strong> Section 3.5 it follows that the<br />
asymptotic second order properties of ˆ θ are not affected if the weight<strong>in</strong>g<br />
matrix ˜ W is replaced by a consistent estimate. Hence, if a consistent<br />
estimate of θ0 is used to compute ˜ W, then the criterion <strong>in</strong> the second<br />
⊥Ā step only depends on θ via Π . This may lead to some computational<br />
sav<strong>in</strong>gs, but it still results <strong>in</strong> a non-l<strong>in</strong>ear optimization problem. In the<br />
next section, we turn to the common special case of a uniform l<strong>in</strong>ear<br />
array, for which it is possible to avoid gradient search methods.<br />
3.6.2 Implementation for Uniform L<strong>in</strong>ear Arrays<br />
In this section, we present a way to significantly simplify the implementation<br />
of GWSF if the nom<strong>in</strong>al array is l<strong>in</strong>ear <strong>and</strong> uniform. The form of the<br />
estimator is <strong>in</strong> the spirit of IQML [BM86] <strong>and</strong> MODE [SS90b, SS90a].<br />
Many of the details concern<strong>in</strong>g the implementation are borrowed from<br />
[SS90b]. Def<strong>in</strong>e the follow<strong>in</strong>g polynomial with d roots on the unit circle<br />
correspond<strong>in</strong>g to the DOAs:<br />
b0z d + b1z d−1 + · · · + bd = b0<br />
d�<br />
(z − e jωk ), b0 �= 0, (3.37)<br />
k=1<br />
where ωk = 2πδ s<strong>in</strong>(θk) <strong>and</strong> where δ is the separation between the sensors<br />
measured <strong>in</strong> wavelengths. The idea is to re-parameterize the m<strong>in</strong>imization<br />
problem <strong>in</strong> terms of the polynomial coefficients <strong>in</strong>stead of θ. As will<br />
be seen below, this leads to a considerable computational simplification.<br />
Observe that the roots of (3.37) lie on the unit circle. A common way to<br />
impose this constra<strong>in</strong>t on the coefficients <strong>in</strong> the estimation procedure is<br />
to enforce the conjugate symmetry property<br />
bk = b c d−k; k = 0,... ,d. (3.38)<br />
This only approximately ensures that the roots are on the unit circle.<br />
However, notice that if zi is a root of (3.37) with conjugate symmetric<br />
coefficients, then so is z −c<br />
i . Assume now that we have a set of perturbed<br />
polynomial coefficients satisfy<strong>in</strong>g (3.38). As long as the perturbation is<br />
small enough, the d roots of (3.37) will be on the unit circle s<strong>in</strong>ce the true<br />
DOAs are dist<strong>in</strong>ct. That is, for large enough N, the conjugate symmetry<br />
property <strong>in</strong> fact guarantees that the roots are on the unit circle <strong>and</strong> the<br />
asymptotic properties of the estimates will not be affected.
80 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
Next, def<strong>in</strong>e the m × (m − d) Toeplitz matrix B as<br />
B ∗ ⎡<br />
bd<br />
⎢<br />
= ⎢ 0<br />
⎢ .<br />
⎣ .<br />
· · ·<br />
bd<br />
. ..<br />
b1<br />
· · ·<br />
. ..<br />
b0<br />
b1<br />
0<br />
b0<br />
. ..<br />
· · ·<br />
. ..<br />
. ..<br />
⎤<br />
0<br />
⎥<br />
0 ⎥<br />
. ⎥ .<br />
. ⎦<br />
(3.39)<br />
0 ... 0 bd ... b1 b0<br />
It is readily verified that B ∗ A(θ) = 0, <strong>and</strong> s<strong>in</strong>ce the rank of B is m − d<br />
(by construction, s<strong>in</strong>ce b0 �= 0), it follows that the columns of B span the<br />
null-space of A ∗ . This implies that we can use B <strong>in</strong> (3.39) <strong>in</strong> the GWSF<br />
criterion, m<strong>in</strong>imize with respect to the polynomial coefficients, <strong>and</strong> then<br />
compute θ from the roots of the polynomial (3.37). In the follow<strong>in</strong>g we<br />
give the details on how this can be performed.<br />
It is useful to rewrite the GWSF criterion as follows (cf. (3.20) <strong>and</strong><br />
(3.25)):<br />
where<br />
VGWSF = ¯ε ∗ WGWSF¯ε =<br />
The blocks <strong>in</strong> ¯ W −1 are given by<br />
�<br />
∗ vec(B ÊW)<br />
vec(BTÊc W )<br />
�∗<br />
¯W<br />
�<br />
∗ vec(B ÊW)<br />
vec(BTÊc W )<br />
�<br />
, (3.40)<br />
ÊW = 1<br />
√ Ês<br />
ˆσ 2<br />
ˆ −1/2<br />
Λ� Λˆ s , (3.41)<br />
¯W −1 �<br />
( ¯ −1 W )11 (<br />
=<br />
¯ W−1 �<br />
)12<br />
. (3.42)<br />
( ¯ W −1 ) c 12 ( ¯ W −1 ) c 11<br />
( ¯ W −1 )11 = (Id ′ ⊗ B∗ B) + � ¯ T T ⊗ B ∗ � Dρ ¯ ΩD ∗ ρ<br />
( ¯ W −1 )12 = � ¯ T T ⊗ B ∗ � Dρ ¯ ΩD T ρ<br />
� ¯T ⊗ B c � ,<br />
� ¯T c ⊗ B � ,<br />
where ¯ T = T ˆ −1/2<br />
Λ� Λˆ s / √ ˆσ 2 = A † ÊW. Let ÊW•k denote column k <strong>and</strong><br />
ÊWi,j the (i,j)th element of ÊW, <strong>and</strong> def<strong>in</strong>e the vector<br />
b = � b0 b1 ...<br />
�T bd .<br />
Observe that<br />
⎡<br />
B ∗ ÊW•k =<br />
⎢<br />
⎣<br />
ÊWd+1,k<br />
ÊWd+2,k<br />
.<br />
ÊWm,k<br />
ÊWd,k ... ÊW1,k<br />
ÊWd+1,k ... ÊW2,k<br />
.<br />
ÊWm−1,k ... ÊWm−d,k<br />
⎤<br />
⎥<br />
⎦ b � ˜ Skb
3.6 Implementation 81<br />
for k = 1, ..., d ′ , <strong>and</strong> def<strong>in</strong>e<br />
Then we have<br />
S =<br />
⎡<br />
⎢<br />
⎣<br />
˜S1<br />
.<br />
˜Sd ′<br />
⎤<br />
⎥<br />
⎦.<br />
vec(B ∗ ÊW) = Sb.<br />
Let β = � b0 ...<br />
�T, b⌊(d−1)/2⌋ where ⌊(·)⌋ denotes the largest <strong>in</strong>teger<br />
less than or equal to (·). In view of the conjugate symmetry constra<strong>in</strong>t<br />
(3.38),<br />
⎡ ⎤<br />
b = ⎣<br />
β<br />
(b d/2)<br />
Ĩβ c<br />
where the real valued parameter bd/2 only appears if d is even. Here,<br />
Ĩ is the exchange matrix with ones along the anti-diagonal <strong>and</strong> zeros<br />
elsewhere. Clearly, we have<br />
Sb = � ⎡ ⎤<br />
�<br />
βr + jβi S1 (s) S2 ⎣ (bd/2) ⎦<br />
Ĩ(βr − jβi) ⎡ ⎤<br />
⎦,<br />
= � S1 + S2Ĩ (s) j(S1 − S2Ĩ)� ⎣<br />
� Fµ,<br />
β r<br />
(b d/2)<br />
β i<br />
where the subscripts r <strong>and</strong> i denote real <strong>and</strong> imag<strong>in</strong>ary parts, respectively.<br />
With the notation <strong>in</strong>troduced above, the criterion (3.40) can be written<br />
as<br />
VGWSF = µ T<br />
�<br />
F<br />
Fc �∗ ¯W<br />
�<br />
F<br />
Fc �<br />
µ.<br />
To obta<strong>in</strong> a proper parameterization we should also <strong>in</strong>clude a non-triviality<br />
constra<strong>in</strong>t to avoid the solution b = 0 (µ = 0). In [SS90b] it is proposed<br />
that either b0r or b0i can be set to one. In what follows we assume, for<br />
simplicity, that it is possible to put b0r = µ 1 = 1. For particular cases<br />
⎦
82 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
this may not be possible. In such cases one should <strong>in</strong>stead use the constra<strong>in</strong>t<br />
b0i = 1. As an alternative to these l<strong>in</strong>ear constra<strong>in</strong>ts, one can use<br />
a norm constra<strong>in</strong>t on µ (see, e.g., [NK94]). The key advantage of the<br />
reformulation of the problem <strong>in</strong> terms of µ is that VGWSF is quadratic <strong>in</strong><br />
µ, <strong>and</strong> thus it can be solved explicitly.<br />
A consistent estimate of θ0 can be obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g<br />
�vec(B ∗ ÊW)� 2 2 s<strong>in</strong>ce B ∗ Es = 0 uniquely determ<strong>in</strong>es θ0. Furthermore,<br />
this also implies that replac<strong>in</strong>g the weight<strong>in</strong>g matrix ¯ W <strong>in</strong> (3.40) by a<br />
consistent estimate does not change the asymptotic distribution of the estimates.<br />
In summary, we have the follow<strong>in</strong>g GWSF algorithm for uniform<br />
l<strong>in</strong>ear arrays:<br />
1. Based on ˆ R, compute Ês, ˆ Λs, <strong>and</strong><br />
ˆσ 2 1<br />
=<br />
m − d ′<br />
�<br />
Tr( ˆ R) − Tr( ˆ �<br />
Λs) ,<br />
ˆ�Λ = ˆ Λs − ˆσ 2 I.<br />
2. Us<strong>in</strong>g the notation <strong>in</strong>troduced above, an <strong>in</strong>itial estimate can be<br />
obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g<br />
�vec(B ∗ ÊW)� 2 2 = �Sb� 2 2 = �Fµ� 2 2 = �Jη + c� 2 2<br />
over η ∈ Rd , where η, J <strong>and</strong> c are def<strong>in</strong>ed via<br />
Fµ = � c J � � �<br />
1<br />
= Jη + c.<br />
η<br />
(3.43)<br />
The norm should be <strong>in</strong>terpreted as �x� 2 2 = x ∗ x. Us<strong>in</strong>g the m<strong>in</strong>imizer<br />
ˆη of (3.43), an estimate of b can be constructed. The <strong>in</strong>itial<br />
estimate of θ0 is then found via the roots of (3.37). Notice that ˆη<br />
is the least-squares solution to the over-determ<strong>in</strong>ed set of equations<br />
given <strong>in</strong> (3.43).<br />
3. Given ¯ Ω <strong>and</strong> an <strong>in</strong>itial estimate ˆ θ (<strong>and</strong> ˆ b), compute ˆ¯ T = A † ( ˆ θ)ÊW,<br />
B( ˆ b) <strong>and</strong> Dρ( ˆ θ). Use these quantities to obta<strong>in</strong> a consistent estimate<br />
of ¯ W −1 <strong>in</strong> (3.42). Let ¯ W −1/2 denote the Cholesky factor of<br />
¯W −1 such that ¯ W −1 = ¯ W −1/2 ¯ W −∗/2 . Solve the follow<strong>in</strong>g l<strong>in</strong>ear<br />
system of equations for Fw<br />
¯W −1/2 Fw =<br />
�<br />
F<br />
Fc �<br />
,
3.7 Performance Analysis of MAPprox 83<br />
<strong>and</strong> def<strong>in</strong>e Jw <strong>and</strong> cw via<br />
Fwµ = � cw Jw<br />
The solution of this step is given by<br />
� � �<br />
1<br />
= Jwη + cw.<br />
η<br />
ˆη = arg m<strong>in</strong><br />
η∈Rd �Jwη + cw� 2 2.<br />
The GWSF estimate ˆ θ is then constructed from ˆη as described<br />
before.<br />
In f<strong>in</strong>ite samples <strong>and</strong> difficult scenarios it may be useful to reiterate the<br />
third step several times; however, this does not improve the asymptotic<br />
statistical properties of the estimate.<br />
In this section, we have shown that GWSF can be implemented <strong>in</strong><br />
a very attractive manner if the nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear. In<br />
particular, the solution can be obta<strong>in</strong>ed <strong>in</strong> a “closed form” by solv<strong>in</strong>g a<br />
set of l<strong>in</strong>ear systems of equations <strong>and</strong> root<strong>in</strong>g the polynomial <strong>in</strong> (3.37).<br />
Thus, there is no need for an iterative optimization procedure like that<br />
necessary <strong>in</strong>, for example, MAP-NSF.<br />
3.7 Performance Analysis of MAPprox<br />
The MAPprox cost function (3.5) was derived by a second order approximation<br />
of the MAP-WSF cost function (3.2) around ρ0. The MAPprox<br />
estimates are thus expected to have a performance similar to ML. This<br />
has also been observed <strong>in</strong> simulations [WOV91, VS94a]. However, a formal<br />
analysis of the asymptotic properties of MAPprox has not been conducted.<br />
In this section we show that, us<strong>in</strong>g asymptotic approximations,<br />
the MAPprox cost function <strong>in</strong> fact can be rewritten to co<strong>in</strong>cide with the<br />
GWSF cost function. This establishes the observed asymptotic efficiency<br />
of the MAPprox method.<br />
Consider the normalized MAPprox cost function (cf. (3.5))<br />
1<br />
N VMAPprox(θ) = V (θ) − 1<br />
2 ∂ρV T �<br />
(θ) ∂ρρV (θ) + ¯ Ω −1�−1 ∂ρV (θ),<br />
where<br />
(3.44)<br />
V (θ) = V (θ,ρ 0) (3.45)
84 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
<strong>and</strong><br />
V (θ,ρ) = 1<br />
ˆσ 2 Tr{Π⊥ A(θ,ρ) Ês ˆ WWSF Ê∗ s}. (3.46)<br />
The gradient <strong>and</strong> the Hessian <strong>in</strong> (3.44) are evaluated at ρ 0. <strong>On</strong>e problem<br />
with VMAPprox(θ) is that the <strong>in</strong>verse (∂ρρV (θ) + ¯ Ω −1 ) −1 may not exist<br />
for all θ. This implies that the consistency of the MAPprox estimate<br />
cannot be guaranteed. However, locally around θ0 the function is well<br />
behaved. In the follow<strong>in</strong>g we will approximate (3.44) around θ0. The<br />
approximation reta<strong>in</strong>s the second order behavior of VMAPprox(θ) around<br />
θ0 <strong>and</strong> is well def<strong>in</strong>ed for all θ, imply<strong>in</strong>g that the consistency problem is<br />
elim<strong>in</strong>ated. More precisely, we will study (3.44) for θ such that θ − θ0 =<br />
Op(1/ √ N).<br />
Let us start by deriv<strong>in</strong>g the gradient ∂ρV . In what follows we write<br />
Π ⊥ <strong>in</strong> place of Π ⊥ A, for simplicity of notation. The derivative of (3.46)<br />
with respect to ρ i is<br />
∂ρ V = i 1<br />
ˆσ 2 Tr{Π⊥i Ês ˆ WWSFÊ∗s} = − 2<br />
�<br />
Re<br />
ˆσ 2<br />
where we have used the follow<strong>in</strong>g result:<br />
Π ⊥ i = −Π ⊥ AiA † − A †∗<br />
A ∗ iΠ ⊥ .<br />
Tr{Π ⊥ AiA † Ês ˆ WWSF Ê∗ s}<br />
Us<strong>in</strong>g the def<strong>in</strong>ition (3.11) <strong>and</strong> the formula (3.7), we can write<br />
∂ρV (θ) = ∂ρV | θ,ρ0<br />
= − 2<br />
�<br />
Re D<br />
ˆσ 2 ∗ ρ vec(Π ⊥ Ês ˆ WWSFÊ∗sA †∗<br />
�<br />
)<br />
Consider next the (i,j)th element of the Hessian matrix<br />
∂ρiρ V = − j 2<br />
�<br />
Re Tr{Π<br />
ˆσ 2 ⊥ j AiA † Ês ˆ WWSFÊ∗s � �� �<br />
Op(1)<br />
�<br />
,<br />
= Op(1/ √ N). (3.47)<br />
+ Π ⊥ AijA † Ês ˆ WWSFÊ∗s +Π ⊥ AiA †<br />
jÊs ˆ WWSFÊ∗s � �� �<br />
Op(1/ √ N)<br />
� �� �<br />
Op(1/ √ N)<br />
�<br />
} . (3.48)
3.7 Performance Analysis of MAPprox 85<br />
Notice that<br />
1<br />
N VMAPprox<br />
=<br />
����<br />
V<br />
Op(1/N)<br />
− 1<br />
2<br />
∂ρV T<br />
� �� �<br />
Op(1/ √ N)<br />
(∂ρρV<br />
� �� �<br />
Op(1)<br />
+ ¯ Ω −1<br />
) −1<br />
����<br />
O(1)<br />
∂ρV<br />
����<br />
Op(1/ √ N)<br />
= Op(1/N).<br />
All of the order relations above are valid locally around θ0. Next, study<br />
the derivative of 1<br />
N VMAPprox with respect to θi (here, an <strong>in</strong>dex i denotes<br />
a partial derivative with respect to θi)<br />
∂<br />
∂θi<br />
1<br />
N VMAPprox<br />
= Vi<br />
����<br />
Op(1/ √ N)<br />
+ 1<br />
2<br />
− 1<br />
2<br />
− 1<br />
2<br />
∂ρV T<br />
� �� �<br />
Op(1/ √ N)<br />
∂ρV T<br />
� �� �<br />
Op(1/ √ N)<br />
∂ρV T<br />
i<br />
� �� �<br />
Op(1)<br />
(∂ρρV<br />
� �� �<br />
Op(1)<br />
(∂ρρV<br />
� �� �<br />
Op(1)<br />
(∂ρρV<br />
� �� �<br />
Op(1)<br />
+ ¯ Ω −1<br />
����<br />
O(1)<br />
+ ¯ Ω −1<br />
����<br />
O(1)<br />
+ ¯ Ω −1<br />
) −1<br />
����<br />
O(1)<br />
∂ρV<br />
����<br />
Op(1/ √ N)<br />
) −1 ∂ρρVi (∂ρρV<br />
� �� �<br />
Op(1)<br />
) −1 ∂ρVi<br />
����<br />
Op(1)<br />
.<br />
� �� �<br />
Op(1)<br />
+ ¯ Ω −1<br />
) −1<br />
����<br />
O(1)<br />
∂ρV<br />
����<br />
Op(1/ √ N)<br />
(3.49)<br />
S<strong>in</strong>ce the term conta<strong>in</strong><strong>in</strong>g ∂ρρVi is of order Op(1/N), we conclude that<br />
∂ρρV can be replaced by a consistent estimate without affect<strong>in</strong>g the<br />
asymptotic second order properties. This implies <strong>in</strong> particular that we<br />
can neglect the last two terms <strong>in</strong> (3.48) <strong>and</strong> only reta<strong>in</strong> the first. (Notice<br />
that all three terms <strong>in</strong> ∂ρρVi are of order Op(1). If the second term <strong>in</strong><br />
(3.49) were Op(1/ √ N), then all three terms <strong>in</strong> (3.48) would have contributed.)<br />
Similarly, it can be shown that asymptotically the two last<br />
terms <strong>in</strong> (3.48) do not affect the second order derivatives of VMAPprox/N
86 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
with respect to θ. Recall (3.48) <strong>and</strong> rewrite it as<br />
∂ρiρ V = − j 2<br />
�<br />
Re Tr{Π<br />
ˆσ 2 ⊥ j AiA † Ês ˆ WWSFÊ∗ �<br />
s} + op(1)<br />
= − 2<br />
�<br />
Re Tr{(−Π<br />
ˆσ 2 ⊥ AjA † − A †∗<br />
= 2<br />
�<br />
Re Tr{A<br />
ˆσ 2 †∗<br />
A ∗ jΠ ⊥ AiA † Ês ˆ WWSFÊ∗ �<br />
s} + op(1)<br />
= 2<br />
�<br />
Re vec<br />
ˆσ 2 ∗ ��A †<br />
(Ai) Ês ˆ WWSFÊ∗sA †∗�T ⊥<br />
⊗ Π � �<br />
vec(Aj)<br />
A ∗ jΠ ⊥ )AiA † Ês ˆ WWSF Ê∗ s}<br />
The dom<strong>in</strong>at<strong>in</strong>g term of (3.50) can be written <strong>in</strong> matrix form as<br />
˜Vρρ(θ) � 2<br />
�<br />
Re D<br />
ˆσ 2 ∗ ��A †<br />
ρ Ês ˆ WWSFÊ∗sA †∗�T ⊥<br />
⊗ Π � �<br />
Dρ<br />
= 2<br />
�<br />
Re D<br />
ˆσ 2 ∗ ρ ˆ �<br />
MDρ .<br />
�<br />
+ op(1)<br />
+ op(1).<br />
(3.50)<br />
(3.51)<br />
If ˜ Vρρ(θ) is used <strong>in</strong> lieu of ∂ρρV (θ) <strong>in</strong> (3.44), we get the follow<strong>in</strong>g<br />
alternate cost function:<br />
VMAPprox2(θ) = V (θ) − 1<br />
2 ∂ρV T �<br />
(θ) ˜Vρρ(θ) + ¯ Ω −1�−1 ∂ρV (θ).<br />
It can be seen that (3.51) is non-negative def<strong>in</strong>ite, which ensures that<br />
the <strong>in</strong>verse ( ˜ Vρρ(θ) + ¯ Ω −1 ) −1 always exists. Furthermore, we have (cf.<br />
(3.14))<br />
˜Vρρ(θ) + ¯ Ω −1 = 2<br />
ˆσ 2 ˆ Γ.<br />
It can also be verified that V (θ) <strong>and</strong> ∂ρV (θ), as def<strong>in</strong>ed <strong>in</strong> (3.45) <strong>and</strong><br />
(3.47), can be written as<br />
V (θ) = 1<br />
2 ¯ε∗ˆ¯ L −1<br />
¯ε,<br />
∂ρV (θ) = − ¯ Ω −1/2 ˆ¯ G ∗ ˆ¯ L −1<br />
¯ε,<br />
where ˆ¯ L <strong>and</strong> ˆ¯ G are def<strong>in</strong>ed similarly to (3.23) <strong>and</strong> (3.24) but computed<br />
with sample data quantities <strong>and</strong> <strong>in</strong> θ (not <strong>in</strong> θ0). This implies that<br />
VMAPprox2(θ) = 1<br />
2 ¯ε∗ ˆ WM2¯ε,
3.8 Simulation Examples 87<br />
where<br />
ˆWM2 =<br />
�<br />
ˆ¯L −1<br />
− ˆ¯ L −1 ˆ¯ G ¯ Ω −1/2 ˆσ 2<br />
2 ˆ Γ −1 ¯ Ω −1/2 ˆ¯ G ∗ ˆ¯ L −1 �<br />
. (3.52)<br />
The MAPprox2 cost function can thus be written <strong>in</strong> the same form as the<br />
GWSF criterion (see (3.20)). The weight<strong>in</strong>g matrix <strong>in</strong> (3.52) depends on<br />
θ <strong>and</strong> on sample data quantities. However, <strong>in</strong> Section 3.5 it was shown<br />
that any two weight<strong>in</strong>g matrices related as W1 = W2 + op(1) give the<br />
same asymptotic performance. This implies that <strong>in</strong> the follow<strong>in</strong>g we can<br />
consider the “limit” of (3.52) <strong>in</strong> lieu of ˆ WM2. In what follows, let WM2<br />
denote ˆ WM2 evaluated <strong>in</strong> θ0 <strong>and</strong> for <strong>in</strong>f<strong>in</strong>ite data. Thus, if we can show<br />
that<br />
WM2 ∝ WGWSF<br />
then we have proven that MAPprox2 has the same asymptotic performance<br />
as GWSF, <strong>and</strong> hence that MAPprox2 is asymptotically efficient.<br />
(The symbol ∝ denotes equality up to a multiplicative scalar constant.)<br />
Comb<strong>in</strong><strong>in</strong>g (3.25), (A.19), <strong>and</strong> (A.21) it is immediately clear that<br />
WM2 = WGWSF.<br />
In this section, the MAPprox cost function was approximated so as<br />
to reta<strong>in</strong> the local properties that affect the estimation error variance.<br />
The approximation (MAPprox2) was then shown to co<strong>in</strong>cide with the<br />
GWSF cost function (with a “consistent” estimate of the weight<strong>in</strong>g matrix),<br />
<strong>and</strong> hence it can be concluded that MAPprox(2) provides m<strong>in</strong>imum<br />
variance estimates of the DOAs. However, the MAPprox cost function<br />
depends on θ <strong>in</strong> quite a complicated manner. In Section 3.8 we compare<br />
MAPprox2 <strong>and</strong> GWSF by means of several simulations, <strong>and</strong> the strong<br />
similarities between these two approaches are clearly visible. S<strong>in</strong>ce the<br />
implementation of GWSF is much easier, its use is preferred.<br />
3.8 Simulation Examples<br />
In this section, we illustrate the f<strong>in</strong>d<strong>in</strong>gs of this chapter by means of simulation<br />
examples. The MAP-NSF, MAPprox2 <strong>and</strong> GWSF methods are<br />
compared with one another, as well as with methods like WSF (MODE)<br />
that do not take the array perturbations <strong>in</strong>to account. In [VS94a], the<br />
asymptotic performance of WSF was analyzed for the case under study.<br />
Unfortunately, the result given <strong>in</strong> that paper conta<strong>in</strong>s a pr<strong>in</strong>t<strong>in</strong>g error<br />
<strong>and</strong>, therefore, we provide the follow<strong>in</strong>g result.
88 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
Theorem 3.3. Let ˆ θWSF be the m<strong>in</strong>imiz<strong>in</strong>g argument of VWSF(θ) =<br />
VWSF(θ,ρ 0), where VWSF(θ,ρ) is given <strong>in</strong> (3.3). Then<br />
√ N( ˆ θWSF − θ0) ∈ AsN(0,CWSF),<br />
where<br />
CWSF = σ2<br />
2 C−1 + C −1 F T θ ¯ ΩFθC −1 . (3.53)<br />
Proof. S<strong>in</strong>ce WSF is a special case of GWSF, the result follows from<br />
the analysis <strong>in</strong> Section 3.5. Indeed, if <strong>in</strong> GWSF we use the weight<strong>in</strong>g<br />
W = ¯ L −1 (which makes VGWSF ∝ VWSF) then CWSF = H −1 QH −1 ,<br />
where H <strong>and</strong> Q are given <strong>in</strong> (A.16) <strong>and</strong> (A.17), respectively.<br />
It can be seen from the expression (3.53) that the first term corresponds<br />
to the CRB for the case with no model errors. The second term <strong>in</strong> (3.53)<br />
reflects the performance degradation of WSF due to model errors.<br />
3.8.1 Example 1<br />
Consider a uniform l<strong>in</strong>ear array consist<strong>in</strong>g of m = 10 sensors separated by<br />
a half wavelength. Two signals imp<strong>in</strong>ge from the directions θ0 = [0 ◦ 5 ◦ ] T<br />
relative to broadside. The signals are uncorrelated <strong>and</strong> the SNR is 5 dB:<br />
P/σ 2 = 10 5/10 I2.<br />
The nom<strong>in</strong>al unit ga<strong>in</strong> sensors are perturbed by additive Gaussian r<strong>and</strong>om<br />
variables with variance 0.01. This corresponds to an uncerta<strong>in</strong>ty <strong>in</strong><br />
the ga<strong>in</strong> with a st<strong>and</strong>ard deviation of 10%. Figure 3.1 depicts the rootmean-square<br />
(RMS) error versus the sample size for different methods.<br />
(<strong>On</strong>ly the RMS values for θ1 are displayed; the results correspond<strong>in</strong>g to<br />
θ2 are similar.) In Figure 3.1 <strong>and</strong> <strong>in</strong> the graphs that follow, we show simulation<br />
results us<strong>in</strong>g different symbols, while the theoretical results are<br />
displayed by l<strong>in</strong>es. The empirical RMS values are computed from 1000<br />
<strong>in</strong>dependent trials <strong>in</strong> all simulations. The MAP-NSF <strong>and</strong> MAPprox2<br />
cost functions are m<strong>in</strong>imized with Newton-type methods <strong>in</strong>itialized by<br />
the WSF estimate. WSF (or, equivalently, MODE [SS90a]) is implemented<br />
<strong>in</strong> the “root<strong>in</strong>g form” described <strong>in</strong> [SS90b] (see Section 2.4). The<br />
GWSF method is implemented as described <strong>in</strong> Section 3.6.2. In the examples<br />
presented here, we reiterated the third step of the algorithm three<br />
times. The poor performance of MAP-NSF <strong>in</strong> Figure 3.1 is due to two
3.8 Simulation Examples 89<br />
Figure 3.1: RMS errors for θ1 versus N.<br />
ma<strong>in</strong> reasons. First, MAP-NSF fails to resolve the two signals <strong>in</strong> many<br />
cases. Second, the numerical search is sometimes trapped <strong>in</strong> a local m<strong>in</strong>ima.<br />
The performance of GWSF <strong>and</strong> MAPprox2 is excellent <strong>and</strong> close<br />
to the accuracy predicted by the asymptotic analysis. However, the numerical<br />
m<strong>in</strong>imization of the MAPprox2 cost function is complicated. In<br />
this example, MAPprox2 produced “outliers” <strong>in</strong> about five percent of<br />
the 1000 trials. These outliers were removed before the RMS value for<br />
MAPprox2 was calculated. For small N, GWSF <strong>and</strong> WSF have a similar<br />
performance s<strong>in</strong>ce the f<strong>in</strong>ite sample effects dom<strong>in</strong>ate. However, for larger<br />
N (when calibration errors affect the performance), it can be seen that<br />
GWSF <strong>and</strong> MAPprox2 are significantly better than the st<strong>and</strong>ard WSF<br />
method.
90 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
Figure 3.2: RMS errors for θ1 versus the variance of the uncerta<strong>in</strong>ty <strong>in</strong><br />
the ga<strong>in</strong> of the sensors.<br />
3.8.2 Example 2<br />
Here, we have the same set-up as <strong>in</strong> the previous example, but the sample<br />
size is fixed to N = 200 <strong>and</strong> the variance of the ga<strong>in</strong> uncerta<strong>in</strong>ty is varied.<br />
In Figure 3.2, the RMS errors for θ1 are plotted versus the variance of the<br />
sensor ga<strong>in</strong>s. The curve denoted by CRB <strong>in</strong> Figure 3.2 is the Cramér-<br />
Rao lower bound for the case with no model errors; that is, CRB =<br />
σ 2 C −1 /2N. Notice that GWSF <strong>and</strong> MAPprox2 beg<strong>in</strong> to differ for large<br />
model errors. Indeed, the equivalence of GWSF <strong>and</strong> MAPprox shown <strong>in</strong><br />
Section 3.7 holds for large N <strong>and</strong> small model errors. The three methods<br />
(GWSF, MAPprox2 <strong>and</strong> MAP-NSF) are known to be asymptotically<br />
efficient estimators for this case only; the performance for large model<br />
errors is not predicted by the theory <strong>in</strong> this chapter.
3.9 Conclusions 91<br />
3.8.3 Example 3<br />
This example <strong>in</strong>volves uncerta<strong>in</strong>ty <strong>in</strong> the phase response of the sensors<br />
<strong>and</strong> is taken from [WOV91]. Two plane waves are <strong>in</strong>cident on a uniform<br />
l<strong>in</strong>ear array of m = 6 sensors with half wavelength <strong>in</strong>ter-element spac<strong>in</strong>g.<br />
The signals are uncorrelated <strong>and</strong> 10 dB above the noise, <strong>and</strong> have DOAs<br />
θ0 = [0 ◦ 10 ◦ ] T relative to broadside. The uncerta<strong>in</strong>ty <strong>in</strong> the phase of the<br />
sensors is modeled by add<strong>in</strong>g uncorrelated zero-mean Gaussian r<strong>and</strong>om<br />
variables with covariance<br />
Ω = 10 −3<br />
⎡<br />
10<br />
⎢ 0<br />
⎢ 0<br />
⎢ 0<br />
⎣ 0<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
⎤<br />
0<br />
0 ⎥<br />
0 ⎥<br />
0 ⎥<br />
0 ⎦<br />
0 0 0 0 0 10<br />
to the nom<strong>in</strong>al phase, which is zero for all sensors. RMS errors for θ1<br />
are shown <strong>in</strong> Figure 3.3. It appears that MAP-NSF has a higher small<br />
sample threshold than GWSF <strong>and</strong> MAPprox2. For large samples, the<br />
performance of the methods is similar. Aga<strong>in</strong>, it is clearly useful to take<br />
the array uncerta<strong>in</strong>ty <strong>in</strong>to account when design<strong>in</strong>g the estimator.<br />
3.8.4 Example 4<br />
Here, we consider the same set-up as <strong>in</strong> the previous example, but the<br />
signals are correlated. More precisely, the signal covariance matrix is<br />
� �<br />
1 −0.99<br />
P = 10<br />
−0.99 1<br />
<strong>and</strong> the noise variance is σ 2 = 1. The RMS errors for θ1 are shown <strong>in</strong><br />
Figure 3.4. It can be seen that the two signal subspace fitt<strong>in</strong>g methods<br />
GWSF <strong>and</strong> WSF can resolve the two signals, while the noise subspace<br />
fitt<strong>in</strong>g method MAP-NSF has a much higher threshold.<br />
3.9 Conclusions<br />
This chapter has studied the problem of develop<strong>in</strong>g robust weighted signal<br />
subspace fitt<strong>in</strong>g algorithms for arrays with calibration errors. It<br />
was shown that if the second-order statistics of the calibration errors
92 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />
Figure 3.3: RMS errors for θ1 versus the number of snapshots, N.<br />
are known, then asymptotically statistically efficient weight<strong>in</strong>gs can be<br />
derived for very general perturbation models. Earlier research had resulted<br />
<strong>in</strong> optimal weight<strong>in</strong>gs that were applicable only for very special<br />
cases. The GWSF technique derived here<strong>in</strong> unifies earlier work <strong>and</strong> also<br />
enjoys several advantages over the MAP-NSF approach, another statistically<br />
efficient technique developed for calibration errors. In particular,<br />
s<strong>in</strong>ce GWSF is based on the signal rather than noise subspace, it is a<br />
consistent estimator even when the signals are perfectly coherent, <strong>and</strong><br />
has better f<strong>in</strong>ite sample performance than MAP-NSF when the signals<br />
are highly correlated or closely spaced. In addition, unlike MAP-NSF,<br />
when the array is nom<strong>in</strong>ally uniform <strong>and</strong> l<strong>in</strong>ear, the GWSF criterion can<br />
be re-parameterized <strong>in</strong> such a way that the directions of arrival may be<br />
solved for by root<strong>in</strong>g a polynomial rather than via a gradient search. As a<br />
byproduct of our analysis, we also demonstrated the asymptotic statistical<br />
efficiency of the MAPprox algorithm, another robust DOA estimator<br />
for perturbed arrays. While the MAPprox <strong>and</strong> GWSF estimators have a
3.9 Conclusions 93<br />
Figure 3.4: RMS errors for θ1 versus the number of snapshots, N.<br />
very similar performance, GWSF depends on the parameters <strong>in</strong> a simpler<br />
way than MAPprox <strong>and</strong> this facilitates the implementation. Our simulations<br />
<strong>in</strong>dicate that, <strong>in</strong> the presence of calibration errors with known<br />
statistics, both algorithms can yield a significant performance improvement<br />
over techniques that ignore such errors.
Chapter 4<br />
A Direction Estimator for<br />
Uncorrelated Emitters<br />
In this chapter, a novel eigenstructure-based method for direction estimation<br />
is presented. The method assumes that the emitter signals are<br />
uncorrelated. Ideas from subspace <strong>and</strong> covariance match<strong>in</strong>g methods are<br />
comb<strong>in</strong>ed to yield a non-iterative estimation algorithm when a uniform<br />
l<strong>in</strong>ear array is employed. The large sample performance of the estimator<br />
is analyzed. It is shown that the asymptotic variance of the direction estimates<br />
co<strong>in</strong>cides with the relevant Cramér-Rao lower bound (CRB). A<br />
compact expression for the CRB is derived for the case when it is known<br />
that the signals are uncorrelated, <strong>and</strong> it is lower than the CRB usually<br />
used <strong>in</strong> the array process<strong>in</strong>g literature (assum<strong>in</strong>g no particular structure<br />
for the signal covariance matrix). The difference between the two CRBs<br />
can be large <strong>in</strong> difficult scenarios. This implies that, <strong>in</strong> such scenarios,<br />
the proposed method has significantly better performance than exist<strong>in</strong>g<br />
subspace methods such as, for example, WSF, MUSIC <strong>and</strong> ESPRIT.<br />
Numerical examples are provided to illustrate the obta<strong>in</strong>ed results.<br />
4.1 Introduction<br />
The problem of estimat<strong>in</strong>g the directions of arrival (DOAs) of signals from<br />
array data is well documented <strong>in</strong> the literature. A number of high resolution<br />
algorithms or eigenstructure methods have been presented <strong>and</strong><br />
analyzed (see, e.g., [VO91, Sch81, RK89, SS90a, SS91, BM86, RH89,
96 4 A Direction Estimator for Uncorrelated Emitters<br />
SN89b]). In some applications, such as, radio astronomy, communications<br />
etc., it is reasonable to assume that the signals are spatially uncorrelated.<br />
<strong>On</strong>e disadvantage with eigenstructure or, so-called, subspace<br />
based methods is that it is difficult to <strong>in</strong>corporate prior knowledge of the<br />
signal correlation <strong>in</strong>to the eigendecomposition. Hence, it is difficult to<br />
use this prior <strong>in</strong>formation to <strong>in</strong>crease the estimation accuracy.<br />
Here<strong>in</strong>, we propose an estimator which comb<strong>in</strong>es ideas from subspace<br />
<strong>and</strong> covariance match<strong>in</strong>g methods [TO96, WSW95] which makes it possible<br />
to <strong>in</strong>corporate prior knowledge of the signal correlation <strong>in</strong>to the<br />
estimator. It should be noted that the Cramér-Rao lower bound (CRB)<br />
usually used when compar<strong>in</strong>g estimators <strong>in</strong> array signal process<strong>in</strong>g is not<br />
the proper one <strong>in</strong> this scenario. As will be shown here, the CRB that<br />
should be used to compare estimators that use priors on the source correlation<br />
is <strong>in</strong> general lower than that presented <strong>in</strong>, e.g., [SN89a, VO91].<br />
We derive the relevant CRB for the case under study <strong>and</strong> give a compact<br />
matrix expression for the CRB on the DOA parameters. It is also shown<br />
that the proposed method yields estimates that asymptotically atta<strong>in</strong><br />
this CRB on the estimation error covariance.<br />
The proposed estimator is based on a weighted least squares fit of certa<strong>in</strong><br />
elements <strong>in</strong> the eigendecomposition of the output covariance of the<br />
array. In general, this leads to a multi-modal cost function which has to<br />
be m<strong>in</strong>imized with some multi-dimensional search technique. This will,<br />
of course, lead to a computationally rather unattractive problem. However,<br />
for the common special case of a uniform l<strong>in</strong>ear array (ULA), the<br />
cost function can be reparameterized <strong>in</strong> a similar manner as <strong>in</strong> MODE<br />
<strong>and</strong> IQML [SS90b, SS90a, BM86] (see also [JGO97b]). In this case, the<br />
estimates of the directions can be found from the roots of a certa<strong>in</strong> polynomial.<br />
This leads to a significant computational simplification. Note<br />
that a similar reparameterization <strong>and</strong> “non-iterative” scheme has not yet<br />
been found for maximum-likelihood (ML) or covariance match<strong>in</strong>g techniques.<br />
Thus, the presented method has the important property to yield<br />
a non-iterative solution with the same asymptotic accuracy as ML.<br />
This chapter is organized as follows. In Section 4.2, the model <strong>and</strong><br />
assumptions are given. In Section 4.3, we present the DOA estimator. In<br />
Section 4.4, the Cramér-Rao bound for uncorrelated sources is derived.<br />
The proposed estimator is analyzed <strong>in</strong> Section 4.5, where the large sample<br />
properties of the estimates are derived. In Section 4.5, we also derive<br />
the optimal weight<strong>in</strong>g matrix lead<strong>in</strong>g to m<strong>in</strong>imum variance estimates of<br />
the DOA. The numerical implementation of the method when a ULA is<br />
employed, is discussed <strong>in</strong> Section 4.6. Numerical examples are provided <strong>in</strong>
4.2 Data Model <strong>and</strong> Assumptions 97<br />
Section 4.7, to demonstrate the performance of the proposed algorithm.<br />
F<strong>in</strong>ally, some conclusions are given <strong>in</strong> Section 4.8.<br />
4.2 Data Model <strong>and</strong> Assumptions<br />
Consider d uncorrelated, narrow-b<strong>and</strong>, plane waves imp<strong>in</strong>g<strong>in</strong>g on an<br />
unambiguous array consist<strong>in</strong>g of m sensors. The spatial response of<br />
the array is assumed to be parameterized by the directions of arrival,<br />
θ = [θ1, ..., θd] T . The functional form of the array response vector,<br />
a(θ), is assumed to be known. This scenario is described by the model<br />
x(t) = A(θ)s(t) + n(t),<br />
where the array response matrix, A(θ) = [a(θ1),... ,a(θd)], is the collection<br />
of the array response vectors. The source signals, s(t), <strong>and</strong> the<br />
additive noise, n(t), are assumed to be zero-mean Gaussian r<strong>and</strong>om sequences<br />
with second-order moments given by<br />
E{s(t)s T (τ)} = 0, E{n(t)n T (τ)} = 0,<br />
E{s(t)s ∗ (τ)} = Pδt,τ, E{n(t)n ∗ (τ)} = σ 2 Iδt,τ,<br />
where δ is the Kronecker delta function, <strong>and</strong> where the superscripts T<br />
<strong>and</strong> ∗ st<strong>and</strong> for transposition <strong>and</strong> complex conjugate transposition, respectively.<br />
For later use, we also <strong>in</strong>troduce the superscript c for complex<br />
conjugation. Notice that, s<strong>in</strong>ce the sources are assumed to be spatially<br />
uncorrelated, the signal covariance matrix, P, is diagonal. With the<br />
def<strong>in</strong>itions above <strong>and</strong> under the assumption that s(t) <strong>and</strong> n(t) are uncorrelated,<br />
the output covariance matrix of the array is given by<br />
R = E{x(t)x ∗ (t)} = A(θ)PA ∗ (θ) + σ 2 I . (4.1)<br />
It is further assumed that the number of signals is known, or has been<br />
consistently estimated [VOK91a, Wax92].<br />
4.3 Estimat<strong>in</strong>g the Parameters<br />
The problem under consideration is that of estimat<strong>in</strong>g the unknown parameters<br />
<strong>in</strong> (4.1) from sensor array data. Of ma<strong>in</strong> <strong>in</strong>terest are the DOAs,<br />
θ. The estimation procedure should use the prior knowledge that the<br />
signals are uncorrelated to <strong>in</strong>crease the estimation accuracy.
98 4 A Direction Estimator for Uncorrelated Emitters<br />
The subspace estimation techniques rely on the properties of the<br />
eigendecomposition of the output covariance matrix R. Let<br />
R = EsΛsE ∗ s + EnΛnE ∗ n = EsΛsE ∗ s + σ 2 EnE ∗ n<br />
(4.2)<br />
be a partitioned eigendecomposition of R. Here, Λs is a diagonal matrix<br />
conta<strong>in</strong><strong>in</strong>g the d < m largest eigenvalues <strong>and</strong> the columns of Es are the<br />
correspond<strong>in</strong>g eigenvectors. We will assume that the eigenvalues <strong>in</strong> Λs<br />
are dist<strong>in</strong>ct, which is generically true. Similarly, Λn conta<strong>in</strong>s the m − d<br />
smallest eigenvalues <strong>and</strong> En is composed of the rema<strong>in</strong><strong>in</strong>g eigenvectors.<br />
The fact that Λn = σ 2 I follows from the assumption that A is full rank<br />
<strong>and</strong> s<strong>in</strong>ce P is positive def<strong>in</strong>ite. Compar<strong>in</strong>g (4.1) <strong>and</strong> (4.2) gives<br />
where the def<strong>in</strong>ition<br />
APA ∗ = EsΛsE ∗ s + σ 2 EnE ∗ n − σ 2 I = EsΛE ∗ s, (4.3)<br />
Λ = Λs − σ 2 I<br />
has been used. The last step <strong>in</strong> (4.3) follows from the fact that EnE ∗ n =<br />
I −EsE ∗ s. Apply<strong>in</strong>g the vectorization operator (vec) [Gra81] on (4.3), we<br />
obta<strong>in</strong> 1<br />
(A c ⊗ A) vec(P) = (E c s ⊗ Es) vec(Λ), (4.4)<br />
where ⊗ denotes the Kronecker matrix product. S<strong>in</strong>ce the matrices P<br />
<strong>and</strong> Λ are diagonal, there exists a (d 2 × d) selection matrix L such that<br />
vec(P) = Lp <strong>and</strong> vec(Λ) = Lλ ,<br />
where p <strong>and</strong> λ are (d × 1) vectors consist<strong>in</strong>g of the diagonal entries of P<br />
<strong>and</strong> Λ, respectively. Notice that<br />
(A c ⊗ A)L = (A c ◦ A) ,<br />
where ◦ is the Khatri-Rao matrix product, which is a column-wise Kronecker<br />
product [Bre78]. The relation <strong>in</strong> (4.4) will now be used to formulate<br />
an estimator of the DOA.<br />
Let ˆ R denote the sample estimate of the covariance matrix <strong>in</strong> (4.1);<br />
i.e.,<br />
ˆR = 1<br />
N<br />
1 vec(ABC) = � C T ⊗ A � vec(B)<br />
N�<br />
x(t)x ∗ (t),<br />
t=1
4.3 Estimat<strong>in</strong>g the Parameters 99<br />
where N is the number of snapshots. Let<br />
ˆR = Ês ˆ Λs Ê∗ s + Ên ˆ Λn Ê∗ n<br />
be an eigendecomposition of ˆ R, similar to (4.2). A consistent estimate<br />
of the noise variance is given by<br />
ˆσ 2 = 1<br />
m − d Tr{ˆ Λn} = 1<br />
�<br />
Tr{<br />
m − d<br />
ˆ R} − Tr{ ˆ �<br />
Λs} .<br />
Replac<strong>in</strong>g Es <strong>and</strong> Λ <strong>in</strong> (4.4) by Ês <strong>and</strong> ˆ Λ = ˆ Λs − ˆσ 2 I yields<br />
Us<strong>in</strong>g the def<strong>in</strong>itions<br />
(A c �<br />
c<br />
◦ A)p ≈ Ês ◦ Ês<br />
�<br />
ˆλ . (4.5)<br />
B(θ) = (A c (θ) ◦ A(θ)) = � (a c (θ1) ⊗ a(θ1)) ... (a c (θd) ⊗ a(θd)) �<br />
= � vec(a(θ1)a∗ (θ1)) ... vec(a(θd)a∗ (θd)) � , (4.6)<br />
�<br />
ˆ c<br />
f = Ês ◦ Ês<br />
�<br />
ˆλ, (4.7)<br />
we can, from (4.5), formulate the least squares problem<br />
m<strong>in</strong><br />
θ, p �ˆ f − B(θ)p� 2 2. (4.8)<br />
For fixed θ, the solution of (4.8) with respect to p is<br />
ˆp = B † (θ) ˆ f, (4.9)<br />
where B † = (B ∗ B) −1 B ∗ denotes the Moore-Penrose pseudo-<strong>in</strong>verse of<br />
B. By substitut<strong>in</strong>g ˆp back <strong>in</strong>to (4.8), we get<br />
m<strong>in</strong><br />
θ � ˆ f − B(θ)B † (θ) ˆ f� 2 2 = m<strong>in</strong><br />
θ �Π ⊥ B(θ) ˆ f� 2 2, (4.10)<br />
where Π ⊥ B(θ) = I − B(θ)B † (θ). In general, we will use the notation<br />
Π ⊥ X for the orthogonal projection onto the null-space of X ∗ . Similarly,<br />
ΠX = I−Π ⊥ X will denote the orthogonal projection onto the range-space<br />
of X. An estimate of θ can be obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the criterion <strong>in</strong><br />
(4.10) (see [GJO96]). However, it is useful to <strong>in</strong>troduce a weight<strong>in</strong>g <strong>in</strong>
100 4 A Direction Estimator for Uncorrelated Emitters<br />
the criterion to improve the accuracy of the estimates. More precisely,<br />
we suggest to estimate θ as<br />
ˆθ = arg m<strong>in</strong><br />
θ V (θ) (4.11)<br />
V (θ) = �Π ⊥ B(θ) ˆ f� 2 W = ˆ f ∗ Π ⊥ B(θ)WΠ ⊥ B(θ) ˆ f , (4.12)<br />
where W is a Hermitian positive def<strong>in</strong>ite weight<strong>in</strong>g matrix yet to be<br />
determ<strong>in</strong>ed (see Section 4.5).<br />
Remark 4.1. It is readily shown that ˆp <strong>in</strong> (4.9) is real valued by mak<strong>in</strong>g<br />
use of the results <strong>in</strong> Appendix A.8. However, ˆp is not guaranteed to be<br />
positive for small N. For large N, this is not a problem s<strong>in</strong>ce ˆp is a<br />
consistent estimate of p. This implies that ˆp is positive for sufficiently<br />
large N <strong>and</strong> the large sample performance of the estimates is not affected.<br />
M<strong>in</strong>imiz<strong>in</strong>g (4.12) to obta<strong>in</strong> the estimates ˆ θ can be accomplished<br />
with a Newton-type algorithm. The gradient <strong>and</strong> Hessian of the cost<br />
function, needed <strong>in</strong> such an implementation, are given as by-products<br />
<strong>in</strong> the statistical analysis of the estimator <strong>in</strong> Section 4.5. The criterion<br />
is <strong>in</strong> general multi-modal, render<strong>in</strong>g the multi-dimensional search for a<br />
global extremum computationally expensive. However, when a ULA is<br />
employed it is possible to reparameterize the criterion <strong>and</strong> convert it to a<br />
l<strong>in</strong>ear problem, where the estimate is obta<strong>in</strong>ed by root<strong>in</strong>g a polynomial.<br />
This is further discussed <strong>in</strong> Section 4.6, where the reparameterization is<br />
shown <strong>in</strong> detail.<br />
4.4 The CRB for Uncorrelated Sources<br />
The Cramér-Rao bound (CRB) is a commonly used lower bound on the<br />
estimation error covariance of any unbiased estimator [Cra46]. Let η =<br />
[θ T ,p T ,σ 2 ] T denote the vector that parameterizes the covariance matrix<br />
R. Notice that we here will derive the CRB when it is known that<br />
the emitter signals are uncorrelated. When the asymptotic estimation<br />
error covariance of an estimator is equal to the CRB it is referred to as<br />
be<strong>in</strong>g asymptotically statistically efficient. We are ma<strong>in</strong>ly <strong>in</strong>terested <strong>in</strong><br />
the bound on the direction parameters θ. Let us denote this bound by<br />
CRBθ. We can now formulate the follow<strong>in</strong>g result:<br />
Theorem 4.1. Under the assumptions given <strong>in</strong> Section 4.2, the Cramér-<br />
Rao lower bound on the estimation error covariance for any unbiased
4.5 Asymptotic Analysis 101<br />
estimate of θ is<br />
Here,<br />
CRBθ =<br />
�<br />
PD ∗ G(G ∗ �−1 CG) ˜ −1 ∗<br />
G DP . (4.13)<br />
˜C = � R T ⊗ R � + σ4<br />
m − d vec(ΠA)vec ∗ (ΠA), (4.14)<br />
D = � � � �<br />
D¯ c c<br />
◦ A + A ◦ D¯ , (4.15)<br />
¯D =<br />
� ∂a(θ1)<br />
∂θ1<br />
· · ·<br />
∂a(θd)<br />
∂θd<br />
�<br />
, (4.16)<br />
<strong>and</strong> G is any matrix whose columns span the null-space of B ∗ (see also<br />
Section 4.6).<br />
Proof. The proof is given <strong>in</strong> Appendix A.5.<br />
The CRB for the “st<strong>and</strong>ard” case, when P is any Hermitian positive<br />
def<strong>in</strong>ite matrix is, e.g., given <strong>in</strong> [SN89a, VO91]. A comparison of (4.13)<br />
with the correspond<strong>in</strong>g CRB-expressions <strong>in</strong> [SN89a, VO91] can be used<br />
to quantify the ga<strong>in</strong> <strong>in</strong> performance of us<strong>in</strong>g the prior <strong>in</strong>formation about<br />
the signal correlation. An example of such a comparison is given <strong>in</strong><br />
Section 4.7.<br />
4.5 Asymptotic Analysis<br />
The asymptotic (<strong>in</strong> N) behavior of the estimates given by (4.11) is analyzed<br />
<strong>in</strong> this section. The derivation of the estimator <strong>in</strong> Section 4.3 is<br />
rather ad hoc. However, below we show that the estimator <strong>in</strong> fact is<br />
asymptotically statistically efficient. We will start the asymptotic analysis<br />
by establish<strong>in</strong>g the consistency of the estimates.<br />
4.5.1 Consistency<br />
In what follows, we use θ0 to dist<strong>in</strong>guish the true DOA vector from a<br />
generic vector θ. Notice that ˆ f converges to<br />
f = (E c s ◦ Es) λ = (A c (θ0) ◦ A(θ0))p = B(θ0)p<br />
with probability one as N → ∞. S<strong>in</strong>ce the norm of Π ⊥ B(θ) is bounded,<br />
the criterion (4.12) tends to<br />
¯V = f ∗ Π ⊥ B(θ)WΠ ⊥ B(θ)f
102 4 A Direction Estimator for Uncorrelated Emitters<br />
uniformly <strong>in</strong> θ with probability one. The consistency of ˆ θ <strong>in</strong> (4.11) then<br />
follows if θ = θ0 is the unique solution to ¯ V = 0. Notice that ¯ V = 0<br />
if <strong>and</strong> only if Π ⊥ B(θ)f = Π ⊥ B(θ)B(θ0)p = 0 s<strong>in</strong>ce W > 0 by assumption.<br />
Assume that (without loss of generality), the first n DOAs are equal but<br />
not all; that is, θi = θ0i for i = 1,... ,n, where 0 ≤ n < d. Then<br />
� � �<br />
p1<br />
Π ⊥ B(θ)f = Π ⊥ �<br />
B(θ) B1 B2<br />
= � �<br />
B2 B(θ)<br />
� �� �<br />
2d−n<br />
�<br />
p2<br />
Id−n<br />
−B † (θ)B2<br />
= Π ⊥ B(θ)B2p2<br />
�<br />
p2,<br />
where B1 <strong>and</strong> B2 conta<strong>in</strong> the first n <strong>and</strong> the last d − n columns of<br />
B(θ0), respectively, <strong>and</strong> where p has been partitioned accord<strong>in</strong>gly. In<br />
Appendix A.6 it is proved that � B2 B(θ) � is full rank if m > (2d−n)/2.<br />
This condition is satisfied for all admissible n if m > d. Clearly, the only<br />
solution to Π ⊥ B(θ)f = 0 is p2 = 0, which contradicts the fact that p<br />
consists of the signal powers. Thus, the only valid solution is θ = θ0 <strong>and</strong><br />
strong consistency of the estimates follows.<br />
Remark 4.2. The identifiability condition d < m is natural if no additional<br />
assumptions are made. Consider, for example, the case of a ULA<br />
for which A(θ) is a V<strong>and</strong>ermonde matrix. The array covariance matrix<br />
(4.1) is then a Toeplitz matrix s<strong>in</strong>ce P is diagonal. It is well known<br />
from the Carathéodory parameterization that any Hermitian, positive<br />
def<strong>in</strong>ite, Toeplitz matrix can be written <strong>in</strong> the form<br />
R = APA ∗ + σ 2 I,<br />
with d < m (see, e.g., [SM97]). This implies that if d ≥ m signals<br />
are received by the array, then it cannot be dist<strong>in</strong>guished from another<br />
scenario with d < m (based on the <strong>in</strong>formation <strong>in</strong> R). <strong>On</strong> the other<br />
h<strong>and</strong>, for particular array geometries, it is possible to estimate more<br />
signals than the number of sensors (d ≥ m) under the assumption of<br />
uncorrelated emitter signals (see, e.g., [PPL90]).<br />
4.5.2 Asymptotic Distribution<br />
After establish<strong>in</strong>g consistency, the asymptotic distribution of the estimates<br />
from (4.11) can be derived through a Taylor series expansion approach<br />
(see, e.g., [VO91, SS89]). Let V ′ (θ) <strong>and</strong> V ′′ (θ) denote the gradient<br />
<strong>and</strong> the Hessian of V (θ), respectively. By def<strong>in</strong>ition V ′ ( ˆ θ) = 0, <strong>and</strong> s<strong>in</strong>ce
4.5 Asymptotic Analysis 103<br />
ˆθ is consistent, a first order Taylor series expansion of V ′ ( ˆ θ) around θ0<br />
leads to<br />
˜θ = ˆ θ − θ0 ≃ −H −1 V ′ (θ0), (4.17)<br />
where ≃ denotes equality <strong>in</strong> probability up to first order. The notation<br />
H refers to the limit<strong>in</strong>g Hessian matrix,<br />
H = lim<br />
N→∞ V ′′ (θ0) . (4.18)<br />
The derivative of (4.12) with respect to θk (i.e., the kth element <strong>in</strong> the<br />
parameter vector θ) is given by<br />
V ′<br />
k = ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f + ˆ f ∗ Π ⊥ BW{Π ⊥ B}k ˆ f<br />
≃ −2Re{f ∗ B †∗ B ∗ kΠ ⊥ BWΠ ⊥ B ˆ f}, (4.19)<br />
where, for notational simplicity, the argument (θ0) is omitted. In (4.19),<br />
we made use of the follow<strong>in</strong>g expression for the derivative of the projection<br />
matrix:<br />
{Π ⊥ B}k = −Π ⊥ BB kB † − B †∗ B ∗ kΠ ⊥ B . (4.20)<br />
We have also used the fact that ˆf → f as N → ∞ <strong>and</strong> Π ⊥ Bf = 0. Next, we<br />
will relate ˆf to the sample covariance. From the def<strong>in</strong>ition of ˆf <strong>in</strong> (4.7),<br />
we have<br />
�<br />
ˆf = vec Ês( ˆ Λs − ˆσ 2 I) Ê∗ �<br />
s<br />
�<br />
= vec ˆR 2<br />
− ˆσ I − Ên( ˆ Λn − ˆσ 2 I) Ê∗ �<br />
n<br />
�<br />
= vec ˆR 2<br />
− ˆσ I − ÊnÊ∗n( ˆ R − ˆσ 2 I) ÊnÊ∗ �<br />
n<br />
�<br />
≃ vec ˆR 2 ⊥<br />
− ˆσ I − ΠA( ˆ R − ˆσ 2 I)Π ⊥ �<br />
A<br />
�<br />
= vec<br />
, (4.21)<br />
� ˆR − Π ⊥ A ˆ RΠ ⊥ A − ˆσ 2 Π A<br />
where Π ⊥ A = I − Π A = Π ⊥ A(θ0). In (4.21), use is made of the facts that<br />
Ên Ê∗ n → EnE ∗ n = Π ⊥ A <strong>and</strong> Π ⊥ A( ˆ R − ˆσ 2 I) → 0 as N → ∞. Recall that<br />
the noise variance is estimated as the average of the noise eigenvalues <strong>in</strong>
104 4 A Direction Estimator for Uncorrelated Emitters<br />
ˆΛn <strong>and</strong> notice that<br />
(m − d)(ˆσ 2 − σ 2 ) = Tr{ ˆ Λn − σ 2 Im−d} = Tr{ Ê∗ � �<br />
ˆR 2<br />
n − σ Im Ên}<br />
= Tr{ ÊnÊ∗ � �<br />
ˆR 2<br />
n − σ Im ÊnÊ∗n} ≃ Tr{EnE ∗ � �<br />
ˆR 2<br />
n − σ Im EnE ∗ n}<br />
= Tr{Π ⊥ A ˆ R} − (m − d)σ 2 ,<br />
where Ik denotes a k × k identity matrix. This implies that 2<br />
ˆσ 2 ≃ 1<br />
m − d Tr{Π⊥ A ˆ R} = 1<br />
m − d vec∗ (Π ⊥ A)vec( ˆ R) . (4.22)<br />
Us<strong>in</strong>g (4.22) <strong>in</strong> (4.21), we get<br />
� �<br />
ˆf = I − Π ⊥T<br />
A ⊗ Π ⊥ �<br />
A − 1<br />
m − d vec(ΠA)vec ∗ (Π ⊥ �<br />
A) vec( ˆ R)<br />
� Mvec( ˆ R). (4.23)<br />
From the central limit theorem it follows that the elements <strong>in</strong> √ N( ˆ R−R)<br />
are asymptotically zero-mean Gaussian distributed. The relations (4.19),<br />
(4.17) <strong>and</strong> (4.23) then show that this holds for √ NV ′ (θ0) <strong>and</strong> √ N ˜ θ as<br />
well. We have<br />
√ NV ′ (θ0) ∈ AsN(0,Q),<br />
where<br />
Q = lim<br />
N→∞ N E{V ′ V ′T }, (4.24)<br />
<strong>and</strong> where the notation AsN means asymptotically Gaussian distributed.<br />
The asymptotic second-order properties of ˜ θ are given by the follow<strong>in</strong>g<br />
result:<br />
Theorem 4.2. If m > d, then the estimate ˆ θ obta<strong>in</strong>ed from (4.11) is a<br />
consistent estimate of θ0 <strong>and</strong> the normalized estimation error is asymptotically<br />
Gaussian distributed accord<strong>in</strong>g to<br />
√ N( ˆ θ − θ0) ∈ AsN(0,Γ),<br />
2 Tr{AB} = vec ∗ (A ∗ ) vec(B)
4.5 Asymptotic Analysis 105<br />
where<br />
The matrices H <strong>and</strong> Q are given by<br />
where<br />
Γ = H −1 QH −1 .<br />
H = 2Re{PD ∗ Π ⊥ BWΠ ⊥ BDP}, (4.25)<br />
Q = 2Re{U ∗ ¯ CU c + U ∗ CU}, (4.26)<br />
U = Π ⊥ BWΠ ⊥ BDP, (4.27)<br />
<strong>and</strong> where D is def<strong>in</strong>ed <strong>in</strong> (4.15). The two covariance matrices C <strong>and</strong> ¯ C<br />
are def<strong>in</strong>ed as<br />
C = lim<br />
N→∞ N E{˜f ˜f ∗ }, (4.28)<br />
¯C = lim<br />
N→∞ N E{˜f ˜f T }, (4.29)<br />
where ˜ f = ˆ f − f. Explicit expressions for C <strong>and</strong> ¯ C are given <strong>in</strong> Appendix<br />
A.7. All quantities are evaluated at the true DOA θ0.<br />
Proof. The derivation of H <strong>and</strong> Q can be found <strong>in</strong> Appendix A.7.<br />
4.5.3 Optimal Weight<strong>in</strong>g<br />
The result <strong>in</strong> Theorem 4.2 is valid for any positive def<strong>in</strong>ite weight<strong>in</strong>g<br />
W. However, it is of course of <strong>in</strong>terest to f<strong>in</strong>d a weight<strong>in</strong>g such that the<br />
asymptotic covariance matrix Γ is m<strong>in</strong>imized (<strong>in</strong> the sense of positive<br />
def<strong>in</strong>iteness). A lower bound on Γ is given by the CRB <strong>in</strong> Theorem 4.1.<br />
Next, we show that there <strong>in</strong>deed exists a weight<strong>in</strong>g matrix W such that<br />
Γ atta<strong>in</strong>s the CRB. This implies that the estimate <strong>in</strong> (4.11) is asymptotically<br />
statistically efficient.<br />
Theorem 4.3. If<br />
W = Wopt =<br />
where ˜ C is def<strong>in</strong>ed <strong>in</strong> (4.14), then<br />
�<br />
Π ⊥ B ˜ CΠ ⊥ B + BB ∗� −1<br />
, (4.30)<br />
Γ = CRBθ.<br />
Here, CRBθ is the Cramér-Rao lower bound on the estimation error covariance<br />
of θ correspond<strong>in</strong>g to the prior knowledge that the signals are<br />
uncorrelated (see Theorem 4.1).
106 4 A Direction Estimator for Uncorrelated Emitters<br />
Proof. The proof can be found <strong>in</strong> Appendix A.8.<br />
Note that the optimal weight<strong>in</strong>g matrix <strong>in</strong> (4.30) depends on the true<br />
parameters. However, s<strong>in</strong>ce Π ⊥ B(θ0)f = 0, it can be shown that Wopt can<br />
be replaced by a consistent estimate without chang<strong>in</strong>g the asymptotic<br />
analysis given above. In particular, this implies that the estimator can be<br />
implemented as a two-step procedure as further expla<strong>in</strong>ed <strong>in</strong> Section 4.6.<br />
Remark 4.3. By mak<strong>in</strong>g use of the results <strong>in</strong> Appendix A.5 <strong>and</strong> Appendix<br />
A.8, it is possible to show that ˆ f can be replaced by vec{ ˆ R − ˆσ 2 I}<br />
<strong>in</strong> the criterion (4.12), without chang<strong>in</strong>g the asymptotic second order<br />
properties of ˆ θ [GO97]. We will not elaborate more on this here.<br />
4.6 Implementation<br />
This section deals with the implementation of the estimator (4.11) when<br />
a uniform l<strong>in</strong>ear array is employed. For general arrays, the criterion<br />
<strong>in</strong> (4.12) can be m<strong>in</strong>imized by a Newton-type method. In the follow<strong>in</strong>g<br />
we will discuss a way to avoid gradient search algorithms. For ULAs,<br />
a technique similar to the one used <strong>in</strong> MODE [SS90b, SS90a] can be<br />
utilized. Note that, when the array consists of uniformly spaced omnidirectional<br />
sensors, the steer<strong>in</strong>g matrix, A, takes the form<br />
⎛<br />
⎞<br />
1 · · · 1<br />
⎜<br />
A = ⎜<br />
⎝<br />
e iω1 · · · e iωd<br />
.<br />
.<br />
e i(m−1)ω1 · · · e i(m−1)ωd<br />
⎟<br />
⎠ ,<br />
where ωk (k = 1, ..., d) are the so-called spatial frequencies. The relationship<br />
between ωk <strong>and</strong> θk is given by<br />
ωk = 2π∆s<strong>in</strong>(θk),<br />
where ∆ is the element spac<strong>in</strong>g measured <strong>in</strong> wavelengths, <strong>and</strong> where θk<br />
is measured relative to array broadside.<br />
The idea is to f<strong>in</strong>d a basis for the null-space of B ∗ that depends<br />
l<strong>in</strong>early on a m<strong>in</strong>imal set of parameters. For that purpose, <strong>in</strong>troduce the<br />
follow<strong>in</strong>g polynomial<br />
g0z d + g1z d−1 + · · · + gd = g0<br />
g0 �= 0 .<br />
d�<br />
(z − e iωk ) (4.31)<br />
k=1
4.6 Implementation 107<br />
From the def<strong>in</strong>ition of B <strong>in</strong> (4.6), it follows that the kth column of B is<br />
given by<br />
B•k = [1 zk · · · z m−1<br />
k<br />
· · · z −(m−1)<br />
k<br />
z −1<br />
k 1 · · · z m−2<br />
k<br />
z −2<br />
k z −1<br />
k · · · z m−3<br />
k<br />
· · ·<br />
z −(m−2)<br />
k · · · 1] T , (4.32)<br />
where zk = e iωk . The goal is to f<strong>in</strong>d a full rank matrix G of dimension<br />
m 2 × (m 2 − d) such that<br />
G ∗ B = 0 .<br />
In this regard, it is useful to permute the rows of B, <strong>and</strong> correspond<strong>in</strong>gly<br />
the columns of G ∗ , such that the permuted version of (4.32) reads<br />
˜B•k = [z −m+1<br />
k<br />
· · · z m−2<br />
k<br />
z −m+2<br />
k<br />
z m−2<br />
k<br />
z −m+2<br />
k<br />
z m−1<br />
k ] T .<br />
z −m+3<br />
k<br />
z −m+3<br />
k<br />
z −m+3<br />
k<br />
Notice that the permutation is made such that elements conta<strong>in</strong><strong>in</strong>g zk<br />
of the same power are collected next to each other. This permutation<br />
will simplify the construction of ˜ G ∗ , the permuted version of G ∗ . The<br />
parameterization is best described by examples. The generalization of the<br />
parameterization will be straightforward from the examples given next.<br />
Example 4.1. Let m = 3 <strong>and</strong> d = 1 which implies that we need m 2 −d =<br />
8 <strong>in</strong>dependent rows to form ˜ G ∗ . <strong>On</strong>e such ˜ G ∗ is given by<br />
⎡<br />
˜G ∗ ⎢<br />
= ⎢<br />
⎣<br />
g1 g0 0 0 0 0 0 0 0<br />
g1 0 g0 0 0 0 0 0 0<br />
0 g1 0 g0 0 0 0 0 0<br />
0 g1 0 0 g0 0 0 0 0<br />
0 g1 0 0 0 g0 0 0 0<br />
0 0 0 g1 0 0 g0 0 0<br />
0 0 0 g1 0 0 0 g0 0<br />
0 0 0 0 0 0 g1 0 g0<br />
⎤<br />
· · ·<br />
⎥ , (4.33)<br />
⎥<br />
⎦<br />
which is easily seen to be full rank <strong>and</strong> to satisfy the key relation ˜ G ∗ ˜ B =<br />
0. The column partition<strong>in</strong>g <strong>in</strong> (4.33) corresponds to the different powers<br />
of zk <strong>in</strong> ˜ B•k. Notice that there exists several alternative positions for g1<br />
<strong>in</strong> the last six rows.
108 4 A Direction Estimator for Uncorrelated Emitters<br />
Example 4.2. In this example, let m = 3 <strong>and</strong> d = 2; i.e., m2 − d = 7<br />
<strong>in</strong>dependent rows are needed. In this case we may take<br />
˜G ∗ ⎡<br />
0<br />
⎢ g2 ⎢ g2 ⎢<br />
= ⎢ g2 ⎢ 0<br />
⎣ 0<br />
−1<br />
g1<br />
g1<br />
g1<br />
g2<br />
g2<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
g0<br />
0<br />
0<br />
g1<br />
g1<br />
0<br />
0<br />
g0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
g0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
g0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
g0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
⎤<br />
⎥ .<br />
⎥<br />
⎦<br />
(4.34)<br />
0 0 0 g2 0 0 g1 0 g0<br />
Clearly, the matrix ˜ G ∗ <strong>in</strong> (4.34) is full rank <strong>and</strong> ˜ G ∗ ˜ B = 0. Aga<strong>in</strong>, notice<br />
that there are several alternative positions for g1 <strong>and</strong> g2 <strong>in</strong> many rows.<br />
The first row conta<strong>in</strong><strong>in</strong>g ±1 appears because the pattern of the last six<br />
rows <strong>in</strong> (4.34) cannot be cont<strong>in</strong>ued s<strong>in</strong>ce the degree of the polynomial is<br />
“too large”.<br />
By extend<strong>in</strong>g the pattern given <strong>in</strong> the examples above, it can be seen<br />
that there are d(d − 1)/2 rows with ±1 <strong>and</strong> m 2 − d 2 /2 − d/2 rows with<br />
g-coefficients <strong>in</strong> the general case. Altogether this becomes m 2 − d rows,<br />
<strong>and</strong> by construction these rows are l<strong>in</strong>early <strong>in</strong>dependent. The matrix G ∗<br />
is obta<strong>in</strong>ed by a column permutation of ˜ G ∗ .<br />
Observe that the polynomial (4.31) should have its roots on the unit<br />
, (k =<br />
circle. By impos<strong>in</strong>g the conjugate symmetry constra<strong>in</strong>t gk = gc d−k<br />
0,...,d), it is more likely that the roots will lie on the unit circle (see<br />
[SS90b] for details). Notice that the parameterization rema<strong>in</strong>s l<strong>in</strong>ear<br />
with this constra<strong>in</strong>t. To get a m<strong>in</strong>imal (<strong>in</strong>vertible) reparameterization<br />
we need one additional constra<strong>in</strong>t. It is common to use either a l<strong>in</strong>ear or<br />
a norm constra<strong>in</strong>t on the polynomial coefficients (see [SS90b, NK94]).<br />
Next, notice that Π ⊥ B = ΠG = GG † <strong>and</strong> rewrite the criterion (4.12)<br />
as (see (A.46))<br />
ˆ f ∗ Π ⊥ BWoptΠ ⊥ B ˆ f = ˆ f ∗ G(G ∗ ˜ CG) −1 G ∗ˆ f . (4.35)<br />
S<strong>in</strong>ce G ∗ f = 0, it is possible to show that the <strong>in</strong>verse <strong>in</strong> (4.35) can<br />
be replaced with a consistent estimate without alter<strong>in</strong>g the asymptotic<br />
properties of the estimates. The proposed procedure for estimat<strong>in</strong>g θ can<br />
be summarized as follows:<br />
1. Compute a consistent estimate of θ0 by m<strong>in</strong>imiz<strong>in</strong>g the quadratic<br />
function �G ∗ˆ f� 2 2. Observe that there are only d free real parameters<br />
<strong>in</strong> G under the constra<strong>in</strong>ts mentioned above. Alternatively, one
4.7 Numerical Examples 109<br />
can use any other consistent estimator, for example, root-MUSIC<br />
or ESPRIT.<br />
2. Based on sample data <strong>and</strong> the <strong>in</strong>itial estimate of θ0 from the first<br />
step, calculate consistent estimates ˆ˜ C <strong>and</strong> ˆ G of ˜ C <strong>and</strong> G, respectively.<br />
M<strong>in</strong>imize the follow<strong>in</strong>g quadratic criterion<br />
ˆ f ∗ G( ˆ G ∗ ˆ˜ C ˆ G) −1 G ∗ˆ f<br />
over the d free real parameters <strong>in</strong> G. <strong>On</strong>ce the polynomial coefficients<br />
are given, ˆ θ can be retrieved from the phase angle of the<br />
roots of the polynomial (4.31).<br />
Exactly how the m<strong>in</strong>imization of the two quadratic functions above is performed<br />
depends on the constra<strong>in</strong>t on the polynomial coefficients. For the<br />
constra<strong>in</strong>ts previously mentioned, the m<strong>in</strong>imization can be accomplished<br />
either by solv<strong>in</strong>g an eigenvalue problem or by solv<strong>in</strong>g an over-determ<strong>in</strong>ed<br />
system of l<strong>in</strong>ear equations (see [SS90b, NK94] for details).<br />
4.7 Numerical Examples<br />
In this section, some numerical examples <strong>and</strong> simulations are provided<br />
to illustrate the performance of the proposed estimator. Henceforth, the<br />
proposed method will be called DEUCE (direction estimator for uncorrelated<br />
emitters).<br />
Example 4.3. First, we will study the relative improvement <strong>in</strong> estimation<br />
accuracy which can be obta<strong>in</strong>ed by mak<strong>in</strong>g use of the prior knowledge<br />
that the signals are uncorrelated. We will consider a ULA with m = 3<br />
sensor elements separated by a half wavelength. There are two uncorrelated<br />
plane-waves imp<strong>in</strong>g<strong>in</strong>g on the array from different directions. The<br />
first source is located at 0◦ while the direction to the other is varied. The<br />
SNRs of the sources are varied, but the second source is always 10 decibel<br />
(dB) stronger than the first source. That is, if the SNR of the first source<br />
is denoted by SNR1, then<br />
P/σ 2 � �<br />
SNR1 0<br />
=<br />
.<br />
0 10SNR1<br />
The CRB on the estimation error variance on θ2 is calculated with <strong>and</strong><br />
without the prior on the signal correlation. The CRB for the case with
110 4 A Direction Estimator for Uncorrelated Emitters<br />
uncorrelated signals is given <strong>in</strong> (4.13). The expression for the CRB on θ<br />
without any prior on P is given by the follow<strong>in</strong>g expression (see [SN89a,<br />
VO91]):<br />
σ2 � �<br />
Re (<br />
2N<br />
¯ D ∗ Π ⊥ A ¯ D) ⊙ (PA ∗ R −1 AP) T�� −1<br />
, (4.36)<br />
where ⊙ denotes elementwise multiplication. In Figure 4.1, the ratio<br />
between the two CRBs on θ2 given by (4.36) <strong>and</strong> (4.13) is shown for<br />
different θ2 <strong>and</strong> SNR1. Clearly, the ratio between the two CRBs can<br />
Ratio<br />
8<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
20<br />
10<br />
SNR1<br />
0<br />
−10<br />
0<br />
Figure 4.1: The ratio between the two CRBs for θ2, versus θ2 <strong>in</strong> degrees<br />
<strong>and</strong> SNR1 <strong>in</strong> dB.<br />
be significantly larger than one <strong>in</strong> certa<strong>in</strong> scenarios. Hence, there is<br />
motivation for a method that can utilize the prior <strong>in</strong>formation about the<br />
signal correlation. In the next example, we show that the performance<br />
of DEUCE predicted by the large sample analysis also holds for samples<br />
of practical lengths.<br />
Example 4.4. In the second example, we study the small sample behavior<br />
of DEUCE. Aga<strong>in</strong>, we consider a ULA with m = 3 sensor elements<br />
10<br />
θ2<br />
20<br />
30<br />
40
4.7 Numerical Examples 111<br />
separated by a half wavelength. Two uncorrelated sources are located at<br />
θ = [0 ◦ ,20 ◦ ] T relative to array broadside. The signal covariance matrix<br />
is<br />
P =<br />
�<br />
(3/10) 10 0<br />
0 10 (20/10)<br />
�<br />
<strong>and</strong> the noise variance is σ 2 = 1. The root-mean-square (RMS) error of<br />
the second source is depicted <strong>in</strong> Figure 4.2, along with the correspond<strong>in</strong>g<br />
RMS error for the root-MUSIC estimates [Bar83, RH89]. The RMS errors<br />
RMS error<br />
10 0<br />
10 −1<br />
10 1<br />
Figure 4.2: RMS error <strong>in</strong> degrees for θ2 versus the number of snapshots,<br />
N: ’x’ – DEUCE, ’o’ – root-MUSIC. The solid l<strong>in</strong>e represents the CRB<br />
when it is known that the sources are uncorrelated <strong>and</strong> the dashed l<strong>in</strong>e<br />
is the CRB without this knowledge.<br />
are based on 200 <strong>in</strong>dependent trials. For comparison, the correspond<strong>in</strong>g<br />
CRBs are plotted as well. The solid l<strong>in</strong>e represents the CRB when the<br />
prior is <strong>in</strong>corporated (see (4.13)), <strong>and</strong> the dashed l<strong>in</strong>e denotes the CRB<br />
N<br />
10 2
112 4 A Direction Estimator for Uncorrelated Emitters<br />
without the prior (see (4.36)). It can be seen that the RMS error for<br />
DEUCE reaches the appropriate CRB at rather small sample values. In<br />
this example, we made use of the root-MUSIC estimate <strong>in</strong> the first step<br />
of the algorithm given <strong>in</strong> Section 4.6. A l<strong>in</strong>ear constra<strong>in</strong>t (Re(g0) = 1)<br />
on the polynomial coefficients was employed <strong>in</strong> the second step of the<br />
algorithm. The f<strong>in</strong>ite sample performance can be improved slightly by<br />
iterat<strong>in</strong>g the second step several times. However, the RMS errors <strong>in</strong><br />
Figure 4.2 are calculated for the estimates obta<strong>in</strong>ed from the two-step<br />
procedure outl<strong>in</strong>ed <strong>in</strong> Section 4.6.<br />
Example 4.5. Another important issue is the robustness. That is, what<br />
happens if the sources are correlated? The follow<strong>in</strong>g example is provided<br />
to partially answer this question. The setup is the same as <strong>in</strong> Example 4.4<br />
except for the signal correlation. The signal covariance matrix is <strong>in</strong> this<br />
example given by<br />
P =<br />
� p1 p12<br />
p c 12 p2<br />
where p1 = 10 (3/10) , p2 = 10 (20/10) , <strong>and</strong> where the cross-correlation is<br />
def<strong>in</strong>ed as p12 = � (p1p2)ρe iφ . In the simulations, the correlation coefficient,<br />
ρ, as well as the correlation phase, φ, are varied. The DOAs<br />
are estimated from a batch of 100 snapshots. The RMS error for θ2,<br />
computed from 200 <strong>in</strong>dependent trials, is plotted as a function of the<br />
correlation coefficient <strong>in</strong> Figure 4.3. For comparison, the correspond<strong>in</strong>g<br />
RMS error for the root-MUSIC estimates is shown as well. The CRB<br />
given by (4.36) is also displayed <strong>in</strong> Figure 4.3. It is seen that DEUCE<br />
has a better performance than root-MUSIC for small ρ as expected. It is<br />
<strong>in</strong>terest<strong>in</strong>g to observe that the two methods have a similar performance<br />
when the correlation <strong>in</strong>creases.<br />
4.8 Conclusions<br />
In this chapter, a method (DEUCE) for estimat<strong>in</strong>g the locations of uncorrelated<br />
emitter signals from array data was proposed <strong>and</strong> analyzed.<br />
The asymptotic distribution for the direction estimates was derived. The<br />
asymptotic variance was shown to co<strong>in</strong>cide with the Cramér-Rao lower<br />
bound <strong>in</strong>clud<strong>in</strong>g knowledge of uncorrelated signals. This means that<br />
DEUCE is asymptotically statistically efficient. It was also shown that<br />
DEUCE can be implemented as a two-step procedure for uniform l<strong>in</strong>ear<br />
arrays. This makes the method quite attractive s<strong>in</strong>ce it provides<br />
�<br />
,
4.8 Conclusions 113<br />
m<strong>in</strong>imum variance estimates without hav<strong>in</strong>g to resort to non-l<strong>in</strong>ear iterative<br />
m<strong>in</strong>imization. Some computer simulated examples <strong>in</strong>dicate that<br />
DEUCE is rather robust aga<strong>in</strong>st the assumption that the emitters are<br />
uncorrelated. For correlated emitters, DEUCE behaves similar to root-<br />
MUSIC. However, s<strong>in</strong>ce DEUCE is statistically efficient, it outperforms<br />
root-MUSIC for the case of uncorrelated emitters.
RMS<br />
RMS<br />
1.8<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
ρ<br />
0<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />
Figure 4.3a: Correlation phase, φ =<br />
0 ◦ .<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />
Figure 4.3c: Correlation phase, φ =<br />
120 ◦ .<br />
ρ<br />
RMS<br />
RMS<br />
5<br />
4.5<br />
4<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
ρ<br />
0<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />
Figure 4.3b: Correlation phase, φ =<br />
60 ◦ .<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />
Figure 4.3d: Correlation phase, φ =<br />
180 ◦ .<br />
Figure 4.3: The RMS error (<strong>in</strong> degrees) for θ2 versus the correlation coefficient,<br />
ρ : ’x’ – DEUCE, ’o’ – root-MUSIC. The dashed l<strong>in</strong>e represents<br />
the CRB without prior knowledge given <strong>in</strong> (4.36).<br />
ρ
Part II<br />
<strong>Subspace</strong> <strong>System</strong><br />
<strong>Identification</strong>
Chapter 5<br />
A L<strong>in</strong>ear Regression<br />
Approach to <strong>Subspace</strong><br />
<strong>System</strong> <strong>Identification</strong><br />
Recently, state-space subspace system identification (4SID) has been suggested<br />
as an alternative to the more traditional prediction error system<br />
identification. The aim of this chapter is to analyze the connections between<br />
these two different approaches to system identification. The conclusion<br />
is that 4SID can be viewed as a l<strong>in</strong>ear regression multistep-ahead<br />
prediction error method with certa<strong>in</strong> rank constra<strong>in</strong>ts. This allows us to<br />
describe 4SID methods with<strong>in</strong> the st<strong>and</strong>ard framework of system identification<br />
<strong>and</strong> l<strong>in</strong>ear regression estimation. For example, this observation<br />
is used to compare different cost-functions which occur rather implicitly<br />
<strong>in</strong> the ord<strong>in</strong>ary framework of 4SID. From the cost-functions, estimates<br />
of the extended observability matrix are derived <strong>and</strong> related to previous<br />
work. Based on the estimates of the observability matrix, the asymptotic<br />
properties of two pole estimators, namely the shift <strong>in</strong>variance method <strong>and</strong><br />
a weighted subspace fitt<strong>in</strong>g method, are analyzed. Expressions for the<br />
asymptotic variances of the pole estimation error are given. From these<br />
expressions, difficulties <strong>in</strong> choos<strong>in</strong>g user-specified parameters are po<strong>in</strong>ted<br />
out. Furthermore, it is found that a row-weight<strong>in</strong>g <strong>in</strong> the subspace estimation<br />
step does not affect the pole estimation error asymptotically.
118 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
5.1 Introduction<br />
Algorithms for state-space subspace system identification (4SID) have<br />
attracted a large <strong>in</strong>terest recently [VD94b, Lar90, Ver94, VOWL93]. An<br />
overview of 4SID methods can be found <strong>in</strong> [VD94a, Vib94]. The schemes<br />
for subspace system identification considered are very attractive s<strong>in</strong>ce<br />
they estimate a state-space representation directly from <strong>in</strong>put-output<br />
data. This is done without requir<strong>in</strong>g a canonical parameterization as<br />
is often necessary <strong>in</strong> traditional prediction error methods [Lju87, SS89].<br />
This requirement has been relaxed <strong>in</strong> [McK94b, McK94a] where it is<br />
shown that an over-parameterization of the state-space model does not<br />
necessarily lead to performance degradation. However the estimation<br />
method analyzed there<strong>in</strong> requires a large parameter search. The 4SID<br />
methods use reliable numerical tools such as QR-decomposition <strong>and</strong> s<strong>in</strong>gular<br />
value decomposition (SVD) which lead to numerically effective implementations.<br />
The nonl<strong>in</strong>ear iterative optimization schemes that are<br />
usually necessary <strong>in</strong> the traditional framework are avoided. However, at<br />
the same time, this <strong>in</strong>troduces new problems <strong>in</strong> terms of underst<strong>and</strong><strong>in</strong>g<br />
the properties of the algorithms. Perhaps the most tractable property of<br />
4SID is that the computations for multi-variable systems are just as easy<br />
as for s<strong>in</strong>gle <strong>in</strong>put s<strong>in</strong>gle output systems. Another <strong>in</strong>terest<strong>in</strong>g property<br />
is the connection to model reduction [VD95a].<br />
In the last few years, several new 4SID algorithms have been tested<br />
both <strong>in</strong> practice <strong>and</strong> by computer simulations. The results have been<br />
compared with results obta<strong>in</strong>ed with classical tools. In general, 4SID<br />
methods seem to perform very well, even close to optimal. The motivation<br />
for our work is to further <strong>in</strong>crease the underst<strong>and</strong><strong>in</strong>g of 4SID methods.<br />
The objective is to study the quality of the estimated models, to try to<br />
underst<strong>and</strong> the performance limitations <strong>and</strong> capabilities of 4SID methods<br />
<strong>and</strong>, if possible, to suggest improvements <strong>and</strong> give user guidel<strong>in</strong>es.<br />
In this chapter, 4SID methods are phrased <strong>in</strong> a l<strong>in</strong>ear regression framework.<br />
This allows us to analyze the problem <strong>in</strong> a more traditional way.<br />
<strong>On</strong>e advantage is that this expla<strong>in</strong>s more clearly the partition of data <strong>in</strong><br />
the past <strong>and</strong> future which is done <strong>in</strong> 4SID. Also the l<strong>in</strong>ear regression formulation<br />
is useful to relate different approaches to 4SID. Specifically, the<br />
estimation of the extended observability matrix (the subspace estimation)<br />
is discussed <strong>in</strong> some detail based on different cost-functions. The estimates<br />
are shown to be identical to estimates used <strong>in</strong> exist<strong>in</strong>g algorithms.<br />
To compare different methods, we consider the asymptotic performance<br />
of two pole estimation techniques which are based on an estimate of
5.2 Model <strong>and</strong> Assumptions 119<br />
the extended observability matrix. The methods are the shift <strong>in</strong>variance<br />
method [Kun78, HK66] <strong>and</strong> the weighted subspace fitt<strong>in</strong>g method<br />
proposed <strong>in</strong> [OV94]. We give explicit expressions for the asymptotic covariance<br />
of the estimates. These expressions are rather <strong>in</strong>volved, <strong>and</strong> it<br />
is difficult to <strong>in</strong>terpret the results. However, one <strong>in</strong>terest<strong>in</strong>g general observation<br />
is that a row-weight<strong>in</strong>g <strong>in</strong> the subspace estimation step does<br />
not affect the pole estimation error asymptotically. This result holds<br />
both for the shift <strong>in</strong>variance method <strong>and</strong> for the optimally weighted subspace<br />
fitt<strong>in</strong>g method. In order to ga<strong>in</strong> further <strong>in</strong>sight, we specialize to<br />
certa<strong>in</strong> numerical examples which show that the choice of method <strong>and</strong><br />
user-provided parameters is not obvious.<br />
The rest of this chapter is organized as follows. In Section 5.2 the<br />
problem description is given along with the general assumptions made.<br />
In Section 5.3, we present the l<strong>in</strong>ear regression <strong>in</strong>terpretation of 4SID.<br />
Based on the l<strong>in</strong>ear regression model, subspace estimation is discussed<br />
<strong>in</strong> some detail <strong>in</strong> Section 5.4. Section 5.5 <strong>in</strong>troduces the two pole estimation<br />
algorithms considered. In Section 5.6, we analyze the asymptotic<br />
performance of these two methods <strong>and</strong> give numerical illustrations of the<br />
results <strong>in</strong> Section 5.7. Section 5.8 concludes the chapter.<br />
5.2 Model <strong>and</strong> Assumptions<br />
Consider a l<strong>in</strong>ear time-<strong>in</strong>variant system <strong>and</strong> assume that the dynamics<br />
of the system can be described by an nth order state-space realization<br />
x(t + 1) = Ax(t) + Bu(t) + w(t) (5.1a)<br />
y(t) = Cx(t) + Du(t) + v(t) . (5.1b)<br />
Here, x(t) is an n-dimensional state-vector, u(t) is a vector of m <strong>in</strong>put<br />
signals, w(t) represents the process noise, the vector y(t) conta<strong>in</strong>s l output<br />
signals, <strong>and</strong> v(t) is additive measurement noise. The system matrices<br />
have appropriate dimensions. The system is assumed to be asymptotically<br />
stable, i.e., the eigenvalues of A lie strictly <strong>in</strong>side the unit circle.<br />
Furthermore, for certa<strong>in</strong> theoretical derivations, A is assumed to be diagonalizable.<br />
It is well known that the realization <strong>in</strong> (5.1) is not unique. In fact,<br />
any realization characterized by the set<br />
{A, B, C, D}
120 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
has the same <strong>in</strong>put-output description as the realization given by the set<br />
{T −1 AT, T −1 B, CT, D}<br />
(if the process noise is properly scaled), where T is any full rank matrix.<br />
This is known as a similarity transformation, <strong>and</strong> the two realizations<br />
are said to be similar. The objective of an identification of the system<br />
description <strong>in</strong> (5.1) can be stated as: estimate the system order n, a<br />
matrix set {A, B, C, D} which is similar to the true description <strong>and</strong><br />
possibly the noise covariances from the measured data {u(t), y(t)} N t=1.<br />
5.2.1 4SID Basic Equation<br />
<strong>Subspace</strong> estimation techniques rely on a model that has an <strong>in</strong>formative<br />
part which <strong>in</strong>herits a low-rank property plus a possible noise part. In<br />
4SID, the model is obta<strong>in</strong>ed through a stack<strong>in</strong>g of <strong>in</strong>put-output data.<br />
Introduce a lα × 1 vector of stacked consecutive outputs as<br />
yα(t) =<br />
1<br />
�<br />
√ y(t)<br />
N − α − β + 1<br />
T ,y(t + 1) T ,...,y(t + α − 1) T� T<br />
,<br />
(5.2)<br />
where α is a user-specified <strong>in</strong>teger. To assure a low-rank property, α<br />
is required to be larger than the system order n. The scal<strong>in</strong>g factor <strong>in</strong><br />
(5.2) (1/ √ N − α − β + 1) is <strong>in</strong>troduced <strong>in</strong> order to simplify the def<strong>in</strong>ition<br />
of sample covariance matrices <strong>in</strong> what follows. The parameter β is<br />
<strong>in</strong>troduced below. Similarly, def<strong>in</strong>e vectors of stacked <strong>in</strong>puts, process,<br />
<strong>and</strong> measurement noises as uα(t), wα(t), <strong>and</strong> vα(t), respectively. Now it<br />
is straightforward to iterate the system equations <strong>in</strong> (5.1) to obta<strong>in</strong> the<br />
equation for the stacked quantities<br />
yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (5.3)
5.2 Model <strong>and</strong> Assumptions 121<br />
where nα(t) = Ψαwα(t)+vα(t) conta<strong>in</strong>s the noise <strong>and</strong> where the follow<strong>in</strong>g<br />
block matrices have been <strong>in</strong>troduced,<br />
⎡<br />
⎢<br />
Γα = ⎢<br />
⎣<br />
C<br />
CA<br />
.<br />
CAα−1 ⎤<br />
⎥ ,<br />
⎦<br />
(5.4)<br />
⎡<br />
D 0 · · · 0<br />
⎢<br />
Φα = ⎢<br />
⎣<br />
CB<br />
.<br />
D<br />
. ..<br />
. ..<br />
. ..<br />
0<br />
.<br />
CAα−2B CAα−3 ⎤<br />
⎥<br />
⎦<br />
B · · · D<br />
,<br />
⎡<br />
0 · · · 0 0<br />
⎢<br />
Ψα = ⎢<br />
⎣<br />
C<br />
.<br />
. ..<br />
. ..<br />
0 0<br />
.<br />
.<br />
CAα−2 ⎤<br />
⎥<br />
⎦<br />
· · · C 0<br />
.<br />
Equation (5.3) is the basic model <strong>in</strong> 4SID. Notice that Γα is the extended<br />
observability matrix <strong>and</strong> is of rank n if the system is observable.<br />
5.2.2 General Assumptions<br />
Some general assumptions are stated here. More specific assumptions<br />
will be made when needed.<br />
• The process <strong>and</strong> measurement noise are assumed to be stationary<br />
zero-mean ergodic white Gaussian r<strong>and</strong>om processes with covariance<br />
given by<br />
E<br />
� � �<br />
w(t) w(t)<br />
v(t) v(t)<br />
� T<br />
where E is the expectation operator.<br />
=<br />
� rw rwv<br />
r T wv rv<br />
• The measurable <strong>in</strong>put, u(t), is assumed to be a quasi-stationary determ<strong>in</strong>istic<br />
sequence [Lju87] uncorrelated with w(t) <strong>and</strong> v(t). The<br />
correlation sequence of u(t) is def<strong>in</strong>ed as<br />
ru(s) = Ē{u(t + s)uT (t)}<br />
�
122 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
where Ē is def<strong>in</strong>ed as<br />
1<br />
Ē{·} = lim<br />
N→∞ N<br />
N�<br />
E{·} .<br />
Furthermore, assume that the <strong>in</strong>put is persistently excit<strong>in</strong>g (PE) of<br />
order α + β, [Lju87], where the parameter β is described below.<br />
• The system is assumed to be observable which implies that the<br />
(extended) observability matrix, (5.4), has full rank, equal to the<br />
system order n.<br />
5.3 L<strong>in</strong>ear Regression<br />
As a first step, the key idea of subspace identification is to estimate the<br />
n dimensional range space of Γα from Equation (5.3). This can be done<br />
with more or less clever projections on the data. The traditional framework<br />
of 4SID can be a bit difficult <strong>in</strong> terms of advanced numerical l<strong>in</strong>ear<br />
algebra, conceal<strong>in</strong>g the statistical properties. Here, 4SID will be formulated<br />
<strong>in</strong> a l<strong>in</strong>ear regression framework. First we will provide motivation<br />
before <strong>in</strong>troduc<strong>in</strong>g a l<strong>in</strong>ear regression model.<br />
5.3.1 Motivation<br />
Consider the basic equation (5.3). The idea is to replace the unknown<br />
state x(t) with an estimate from an optimal stochastic state observer.<br />
Let<br />
t=1<br />
ˆx(t) = K1yβ(t − β) + K2uβ(t − β) + K3uα(t) (5.5)<br />
be the l<strong>in</strong>ear mean square error optimal estimate of x(t), given yβ(t−β),<br />
uβ(t − β) <strong>and</strong> uα(t). Here, β is a user-specified parameter which clearly<br />
can be <strong>in</strong>terpreted as the size of the “memory” of past data used for<br />
estimat<strong>in</strong>g the state. The estimation error x(t) − ˆx(t) is then orthogonal<br />
to yβ(t −β), uβ(t −β) <strong>and</strong> uα(t). Rewrite (5.3) to give the <strong>in</strong>put-output<br />
relation<br />
yα(t) = Γαˆxt + Φαuα(t) + Γα(x(t) − ˆx(t)) + nα(t)<br />
= Γα (K1yβ(t − β) + K2uβ(t − β) + K3uα(t)) + Φαuα(t) + eα(t)<br />
= L1yβ(t − β) + L2uβ(t − β) + L3uα(t) + eα(t) ,<br />
(5.6)
5.3 L<strong>in</strong>ear Regression 123<br />
where we have <strong>in</strong>troduced the notation<br />
� �<br />
L1 L2 = Γα<br />
<strong>and</strong> where eα(t) is def<strong>in</strong>ed as<br />
� K1 K2<br />
L3 = ΓαK3 + Φα ,<br />
eα(t) = Γα(x(t) − ˆx(t)) + nα(t) .<br />
� , (5.7)<br />
Thus, the “future” outputs, yα(t), are given by certa<strong>in</strong> l<strong>in</strong>ear comb<strong>in</strong>ations<br />
of the data. For example, the comb<strong>in</strong>ation of past <strong>in</strong>puts <strong>and</strong> outputs<br />
is constra<strong>in</strong>ed to lie <strong>in</strong> the range space of the extended observability<br />
matrix. Also notice that by construction, the noise term eα(t) is uncorrelated<br />
with the other terms on the right h<strong>and</strong> side of (5.6). From the construction<br />
above it is also clear that yα(t)−eα(t) conta<strong>in</strong>s the mean-square<br />
optimal multistep ahead predictions of yα(t) given yβ(t − β), uβ(t − β)<br />
<strong>and</strong> uα(t). This is <strong>in</strong> general an approximation of the true predictions<br />
s<strong>in</strong>ce we only have f<strong>in</strong>ite memory. As will be apparent later, another approximation<br />
made is that we consider time <strong>in</strong>dependent K-matrices. In<br />
[VD94b], the relation between subspace projections <strong>and</strong> Kalman filter<strong>in</strong>g<br />
is discussed <strong>in</strong> detail.<br />
Remark 5.1. It should be remarked that even though we <strong>in</strong>troduce an<br />
estimate of the states <strong>in</strong> (5.5), this is merely a help quantity <strong>in</strong> the elaborations<br />
to follow. In this chapter, we consider methods based on an<br />
estimate of Γα <strong>and</strong> the matrices K1, K2, K3 will be elim<strong>in</strong>ated along<br />
the way (see Section 5.4).<br />
Special Cases<br />
Determ<strong>in</strong>istic systems: For the determ<strong>in</strong>istic case with no noise, a<br />
dead-beat observer can be constructed which estimates the state exactly<br />
<strong>in</strong> at most n steps. Hence, the state vector can be obta<strong>in</strong>ed as<br />
x(t) = K1yβ(t − β) + K2uβ(t − β) , (5.8)<br />
for a specific choice of K1 <strong>and</strong> K2 (K3 = 0).<br />
ARX systems: In [WJ94], the properties of ARX models <strong>in</strong> 4SID<br />
are studied <strong>in</strong> some detail. S<strong>in</strong>ce the predictor for ARX models has<br />
f<strong>in</strong>ite memory, this implies that even <strong>in</strong> this case the state can be exactly<br />
recovered <strong>in</strong> n steps, i.e.,<br />
x(t) = K1yβ(t − β) + K2uβ(t − β), β ≥ n,
124 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
even if we have noise! Observe that we have the same expression as <strong>in</strong><br />
the determ<strong>in</strong>istic case, so L3 = Φα. This also implies that eα(t) = nα(t)<br />
which is precisely the prediction error <strong>in</strong> the ARX case (see [WJ94]).<br />
Thus, for ARX systems the 4SID output-vector, yα(t), consists of the<br />
true multistep-ahead predictions plus the prediction errors, i.e.,<br />
yα(t) = [ˆy T t|t−1 ,... , ˆyT t+α−1|t−1 ]T + eα(t) ,<br />
where ˆy t+k|t−1 denotes the prediction of y(t + k) given data up to time<br />
t − 1.<br />
5.3.2 4SID L<strong>in</strong>ear Regression Model<br />
Inspired by the elaborations made above, a l<strong>in</strong>ear regression form of the<br />
4SID output equation (5.3) will be <strong>in</strong>troduced. The explicit relation to<br />
l<strong>in</strong>ear regression will hopefully clarify certa<strong>in</strong> th<strong>in</strong>gs <strong>in</strong> 4SID.<br />
Def<strong>in</strong>ition. The 4SID l<strong>in</strong>ear regression model is def<strong>in</strong>ed by<br />
yα(t) = L1yβ(t − β) + L2uβ(t − β) + L3uα(t) + eα(t) . (5.9)<br />
As seen above, the l<strong>in</strong>ear regression model will <strong>in</strong>herit certa<strong>in</strong> properties.<br />
The noise vector eα(t) is orthogonal to yβ(t − β), uβ(t − β), uα(t), <strong>and</strong><br />
where R[·] denotes the range space of [·].<br />
rank[L1 L2] = n , (5.10a)<br />
R[L1 L2] = R[Γα] , (5.10b)<br />
Remark 5.2. While it is obvious that rank[L1 L2] ≤ n, it is not easy<br />
to establish explicit conditions when equality holds. Loosely speak<strong>in</strong>g,<br />
this requires the states to be sufficiently excited. We will not elaborate<br />
further on this here, but assume (5.10a) to hold true throughout this<br />
chapter. A thorough discussion about the rank condition is made <strong>in</strong> the<br />
next chapter.<br />
In [VD94b], a similar idea to replace the state with an estimate is<br />
elaborated. They consider the estimated state-sequence as obta<strong>in</strong>ed from<br />
a bank of non-steady state Kalman filters <strong>and</strong> derive explicit expressions<br />
for the matrices Li, i = 1,2,3 <strong>in</strong> the case of <strong>in</strong>f<strong>in</strong>ite data.<br />
In 4SID algorithms, it is common to use α = β, <strong>and</strong> no general rule<br />
exists for how to choose them. It is also a quite common idea that the<br />
performance improves if α <strong>and</strong> β are <strong>in</strong>creased. However <strong>in</strong> this chapter,
5.4 <strong>Subspace</strong> Estimation 125<br />
a simple example is provided to show that this is not always the case.<br />
The numerical complexity of 4SID methods <strong>in</strong>creases with the size of α<br />
<strong>and</strong> β, thus these parameters should be as small as possible. The l<strong>in</strong>ear<br />
regression <strong>in</strong>terpretation (th<strong>in</strong>k of the state observers <strong>in</strong> (5.5) <strong>and</strong> (5.8),<br />
<strong>and</strong> th<strong>in</strong>k also of the ARX case) suggests that β most often could be<br />
chosen close to the system order, n, as is also <strong>in</strong>dicated <strong>in</strong> the numerical<br />
examples <strong>in</strong> Section 5.7. While α is required to be greater than the<br />
system order, it is not that obvious what the correspond<strong>in</strong>g condition on<br />
β is. The <strong>in</strong>terpretation <strong>in</strong> terms of state observers suggests that β ≥ n is<br />
a necessary condition. However, an <strong>in</strong>strumental variable <strong>in</strong>terpretation<br />
of 4SID (see [VOWL93, Vib94]) <strong>in</strong>dicates that it may suffice to take<br />
β ≥ n/(l + m). This can also be seen by <strong>in</strong>spection of the necessary<br />
conditions for the dimension of quantities used <strong>in</strong> the subspace estimation<br />
<strong>in</strong> Section 5.4. The question of the required size of β is closely tied to<br />
the rank condition (5.10a); see Remark 5.2 <strong>and</strong> Chapter 6.<br />
5.4 <strong>Subspace</strong> Estimation<br />
Most 4SID methods start the estimation procedure by estimat<strong>in</strong>g the extended<br />
observability matrix. This section discusses the so-called subspace<br />
estimation. To be able to compare the effects of different weight<strong>in</strong>gs (different<br />
algorithms), the estimation of the system poles is considered <strong>in</strong><br />
Section 5.5.<br />
The idea <strong>in</strong> this section is to estimate Γα by m<strong>in</strong>imiz<strong>in</strong>g two costfunctions<br />
which arise naturally from the l<strong>in</strong>ear regression formulation.<br />
The so-obta<strong>in</strong>ed estimates are then related to previously published estimates.<br />
We believe that this comparison/unification aids <strong>in</strong> underst<strong>and</strong><strong>in</strong>g<br />
the nature of the subspace estimation procedure.<br />
Us<strong>in</strong>g (5.9), the observations can be modeled by the matrix equation<br />
Yα = L1Yβ + L2Uβ + L3Uα + Eα , (5.11)<br />
where the block Hankel matrices of outputs are def<strong>in</strong>ed as<br />
Yβ = [yβ(1),...,yβ(N − α − β + 1)] , (5.12a)<br />
Yα = [yα(1 + β),...,yα(N − α + 1)] , (5.12b)<br />
<strong>and</strong> similarly for Uα, Uβ, <strong>and</strong> Eα. The 4SID l<strong>in</strong>ear regression model (5.9)<br />
corresponds to an approximate multistep-ahead prediction error model.<br />
It is quite natural to base an estimation procedure on the m<strong>in</strong>imization of
126 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
the prediction error 1 <strong>in</strong> some suitable norm. Thus, consider the prediction<br />
error covariance matrix<br />
Θ(L1,L2,L3) = EαE T α =<br />
N−α+1 �<br />
t=1+β<br />
eα(t)e T α(t) .<br />
where only the explicit dependence on the regression coefficients has been<br />
stressed. In general, the matrices L1, L2 <strong>and</strong> L3 depend on the system<br />
matrices, noise covariances, <strong>and</strong> the <strong>in</strong>itial state. The structure of these<br />
matrices can be <strong>in</strong>corporated <strong>in</strong> the m<strong>in</strong>imization of Θ. However the<br />
result<strong>in</strong>g nonl<strong>in</strong>ear m<strong>in</strong>imization problem is quite complicated.<br />
We will obta<strong>in</strong> estimates by m<strong>in</strong>imiz<strong>in</strong>g two different norms of Θ.<br />
In the m<strong>in</strong>imization, the structure of the problem will be relaxed <strong>in</strong> the<br />
sense that L3 is considered to be a completely unknown matrix, <strong>and</strong><br />
the only restriction of [L1 L2] will be the rank n constra<strong>in</strong>t of (5.10a).<br />
Thus the m<strong>in</strong>imization is a rank-constra<strong>in</strong>ed problem; however, it can<br />
be transformed to an unconstra<strong>in</strong>ed one by the reparameterization (cf.<br />
(5.7))<br />
� �<br />
L1 L2 = Γα ¯ K . (5.13)<br />
The factorization <strong>in</strong> (5.13) is clearly not unique s<strong>in</strong>ce Γα ¯ K = ΓαTT −1 ¯ K<br />
for any <strong>in</strong>vertible T. However, it turns out that this non-uniqueness is<br />
not a problem. The m<strong>in</strong>imization problem to be solved <strong>in</strong> this section<br />
can now be formulated as<br />
m<strong>in</strong><br />
Γα, ¯ V (Γα,<br />
K, L3<br />
¯ K,L3)<br />
for one of the follow<strong>in</strong>g two functions<br />
or<br />
V (Γα, ¯ K,L3) = Tr{WΘ(Γα, ¯ K,L3)} (5.14)<br />
V (Γα, ¯ K,L3) = det{Θ(Γα, ¯ K,L3)}. (5.15)<br />
Here Tr{·} denotes the trace operator, W is any symmetric positive def<strong>in</strong>ite<br />
weight<strong>in</strong>g matrix, <strong>and</strong> det{·} is the determ<strong>in</strong>ant function. In an<br />
obvious notation, the argument of Θ has been altered accord<strong>in</strong>g to the<br />
1 For simplicity, we will call the residuals <strong>in</strong> the l<strong>in</strong>ear regression model for the<br />
prediction error <strong>in</strong> the sequel.
5.4 <strong>Subspace</strong> Estimation 127<br />
reparameterization. As already mentioned, we are only <strong>in</strong>terested <strong>in</strong> an<br />
estimate of the extended observability matrix Γα; hence ¯ K <strong>and</strong> L3 can<br />
be seen as auxiliary variables.<br />
The next idea is to m<strong>in</strong>imize Θ with respect to L3 <strong>and</strong> Γα <strong>in</strong> terms of<br />
matrix <strong>in</strong>equalities. (For two matrices A <strong>and</strong> B, the <strong>in</strong>equality A−B ≥ 0<br />
means that A − B is positive semi-def<strong>in</strong>ite.) The reason is that these<br />
results are valid for any nondecreas<strong>in</strong>g function of Θ <strong>and</strong> thus specifically<br />
for the two choices (5.14) <strong>and</strong> (5.15) (see [SS89]). To m<strong>in</strong>imize with<br />
respect to ¯ K, however, we have to consider the functions separately. This<br />
is done <strong>in</strong> two subsequent sections.<br />
In order to proceed, it is convenient to <strong>in</strong>troduce the notation<br />
� �<br />
� �<br />
yβ(t)<br />
Yβ<br />
pβ(t) = , Pβ = ,<br />
uβ(t)<br />
Uβ<br />
¯L = � �<br />
L1 L2 = Γα ¯ K , U † α = U T α(UαU T α) −1 .<br />
Rewrite Θ as<br />
Θ(Γα, ¯ K,L3) = � Yα − ¯ ��<br />
LPβ − L3Uα Yα − ¯ �T LPβ − L3Uα<br />
= � L3 − (Yα − ¯ LPβ)U † �<br />
α UαU T� α L3 − (Yα − ¯ LPβ)U † �T α<br />
+ � Yα − ¯ ��<br />
LPβ Yα − ¯ �T �<br />
LPβ − Yα − ¯ � � †<br />
LPβ U αUα Yα − ¯ �T LPβ .<br />
(5.16)<br />
S<strong>in</strong>ce UαU T α is positive def<strong>in</strong>ite by assumption, the unconstra<strong>in</strong>ed L3<br />
m<strong>in</strong>imiz<strong>in</strong>g any nondecreas<strong>in</strong>g function of Θ is given by<br />
ˆL3 = � Yα − ¯ � †<br />
LPβ U α . (5.17)<br />
Let<br />
Π ⊥ α = I − U † αUα<br />
denote the projection matrix on the null-space of Uα. Insert (5.17) <strong>in</strong><br />
(5.16) <strong>and</strong> aga<strong>in</strong> rewrite <strong>in</strong> a similar way us<strong>in</strong>g the factorization ¯ L = Γα ¯ K<br />
Θ(Γα, ¯ K) = � Yα − Γα ¯ � � ⊥<br />
KPβ Πα Yα − Γα ¯ �T KPβ<br />
= � Γα − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1� ¯ KPβΠ ⊥ αP T β ¯ K T<br />
× � Γα − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1� T<br />
+ YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T α .<br />
(5.18)
128 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
The factor ¯ KPβΠ ⊥ αPT β ¯ KT is positive def<strong>in</strong>ite by assumption. (This is<br />
because ¯ K has full row rank <strong>and</strong>, PβΠ ⊥ αPT β is positive def<strong>in</strong>ite under<br />
weak conditions on the noise <strong>and</strong> if u(t) is persistently excit<strong>in</strong>g of order<br />
α + β; see [SS89] Complement C5.1.) Hence, if ¯ K is given, the estimate<br />
of the extended observability matrix is<br />
ˆΓα = YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />
<strong>and</strong> the concentrated form of the covariance matrix equals<br />
(5.19)<br />
Θ( ¯ K) = YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T α .<br />
(5.20)<br />
Until now, the estimates were <strong>in</strong>dependent of the precise norm of Θ<br />
chosen. However to proceed <strong>and</strong> m<strong>in</strong>imize with respect to ¯ K, we have to<br />
consider the two cost-functions (5.14) <strong>and</strong> (5.15) separately.<br />
5.4.1 Weighted Least Squares (WLS)<br />
In this section, we derive the estimate of Γα by m<strong>in</strong>imiz<strong>in</strong>g the costfunction<br />
(5.14), i.e., the weighted sum of the residuals <strong>in</strong> the l<strong>in</strong>ear regression<br />
model. This estimate will be called the weighted least squares<br />
(WLS) estimate. Us<strong>in</strong>g the concentrated form of Θ, (5.20), the m<strong>in</strong>imiz<strong>in</strong>g<br />
¯ K is given by<br />
ˆ¯K = arg m<strong>in</strong><br />
¯K<br />
Tr � WΘ( ¯ K) � . (5.21)<br />
The details of the optimization is deferred to Appendix A.9. The solution<br />
to (5.21) is<br />
ˆ¯K = ¯ T T ˆ V T s (PβΠ ⊥ αP T β ) −1/2<br />
(5.22)<br />
where ¯ T is any n ×n full rank matrix <strong>and</strong> where ˆ Vs is given by the SVD<br />
ˆQ ˆ S ˆ V T = � ˆ Qs<br />
�<br />
ˆQn<br />
� � �<br />
Ss<br />
ˆ 0 ˆV T<br />
s<br />
0 Sn<br />
ˆ ˆV T �<br />
= W<br />
n<br />
1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 .<br />
(5.23)<br />
ˆSs is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the n largest s<strong>in</strong>gular values, <strong>and</strong> the<br />
columns of ˆ Vs are the correspond<strong>in</strong>g right s<strong>in</strong>gular vectors. W 1/2 is a
5.4 <strong>Subspace</strong> Estimation 129<br />
symmetric square-root of W. Insert<strong>in</strong>g (5.22) <strong>in</strong> (5.19) yields the WLS<br />
subspace estimate<br />
ˆΓα = W −1/2 W 1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 ˆ Vs ¯ T( ¯ T T ˆ V T s ˆ Vs ¯ T) −1<br />
= W −1/2 ˆ Q ˆ S ˆ V T ˆ Vs ¯ T −T = W −1/2 ˆ Qs ˆ Ss ¯ T −T .<br />
(5.24)<br />
As mentioned previously, the estimate of the extended observability matrix<br />
concludes the first step <strong>in</strong> many 4SID techniques. The appearance<br />
of the factor ˆ Ss ¯ T−T <strong>in</strong> (5.24) <strong>and</strong> ¯ TT <strong>in</strong> (5.22) reflects the fact that different<br />
state-space realizations are related by a similarity transformation<br />
(cf., the non-uniqueness <strong>in</strong> the factorization (5.13)). ¯ T does not affect<br />
the m<strong>in</strong>imization but determ<strong>in</strong>es which realization is obta<strong>in</strong>ed. Some<br />
common choices are: ¯ T = ˆ S 1/2<br />
s <strong>and</strong> ¯ T = ˆ Ss.<br />
5.4.2 Approximate Maximum Likelihood (AML)<br />
It is well known that under certa<strong>in</strong> assumptions, prediction error methods<br />
have the same performance as maximum likelihood (ML) estimators.<br />
More precisely, it is required that the prediction error is Gaussian noise,<br />
white <strong>in</strong> time, <strong>and</strong> that the cost-function is properly chosen, e.g., the determ<strong>in</strong>ant<br />
of the prediction error covariance matrix [Lju87, SS89]. However<br />
<strong>in</strong> the present problem, the prediction errors are not white, <strong>and</strong> tak<strong>in</strong>g<br />
the determ<strong>in</strong>ant cost-function will not be the ML estimate. Instead<br />
it will be called the Approximate ML (AML) estimate. Actually there<br />
are more profound considerations <strong>in</strong> an ML approach. This is due to the<br />
rank constra<strong>in</strong>t on the regression coefficients (5.10a). For the derivation<br />
of the maximum likelihood estimator for a reduced-rank l<strong>in</strong>ear regression<br />
model, we refer to [SV95, SV96] which also served as <strong>in</strong>spiration to the<br />
presentation <strong>in</strong> this section.<br />
The criterion <strong>in</strong> this section will be chosen as the determ<strong>in</strong>ant of<br />
the prediction error covariance matrix, i.e., (5.15). Due to the general<br />
“m<strong>in</strong>imization” of Θ <strong>in</strong> Section 5.4, the m<strong>in</strong>imiz<strong>in</strong>g ˆ L3 <strong>and</strong> ˆ Γα are given<br />
by (5.17) <strong>and</strong> (5.19), respectively, with the m<strong>in</strong>imiz<strong>in</strong>g ¯ K <strong>in</strong>serted. Us<strong>in</strong>g<br />
(5.20), the estimate of ¯ K is given by<br />
ˆ¯K = arg m<strong>in</strong><br />
¯K<br />
det � Θ( ¯ K) � . (5.25)<br />
The details of the m<strong>in</strong>imization are deferred to Appendix A.10. Intro-
130 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
duc<strong>in</strong>g the SVD similar to (5.23),<br />
ˆQ ˆ S ˆ V T = � ˆ Qs<br />
�<br />
ˆQn<br />
� � �<br />
Ss<br />
ˆ 0 ˆV T<br />
s<br />
0 Sn<br />
ˆ ˆV T �<br />
n<br />
the solution to (5.25) is given by<br />
= (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 , (5.26)<br />
ˆ¯K = ¯ T T ˆ V T s (PβΠ ⊥ αP T β ) −1/2<br />
(5.27)<br />
where ¯ T is any n × n full rank matrix. The AML estimate of Γα is<br />
obta<strong>in</strong>ed by us<strong>in</strong>g (5.27) <strong>in</strong> (5.19)<br />
ˆΓα = (YαΠ ⊥ αY T α) 1/2 (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β<br />
× (PβΠ ⊥ αP T β ) −1/2 ˆ Vs ¯ T( ¯ T T ˆ V T s ˆ Vs ¯ T) −1<br />
= (YαΠ ⊥ αY T α) 1/2 ˆ Q ˆ S ˆ V T ˆ Vs ¯ T −T = (YαΠ ⊥ αY T α) 1/2 ˆ Qs ˆ Ss ¯ T −T .<br />
(5.28)<br />
5.4.3 Discussion<br />
The weight<strong>in</strong>g matrix <strong>in</strong>troduced <strong>in</strong> the weighted least squares method<br />
is by user choice. A couple of different choices correspond to estimates<br />
previously suggested <strong>in</strong> the literature.<br />
PO-MOESP: The PO-MOESP method was <strong>in</strong>troduced <strong>in</strong> [Ver94]. It<br />
is straightforward to verify that choos<strong>in</strong>g W = I gives the same<br />
estimate of Γα (<strong>and</strong> the same s<strong>in</strong>gular values) as <strong>in</strong> PO-MOESP<br />
(cf. [VD94a, Vib94]).<br />
CVA: The choice W = (YαΠ ⊥ αY T α) −1 results <strong>in</strong> the canonical variate<br />
analysis (CVA) solution [Lar83, Lar90]. A natural <strong>in</strong>terpretation<br />
of the algorithmic details of Larimore’s is given <strong>in</strong> [VD94a]. Here<br />
it is called the CVA solution even if Larimore does not explicitly<br />
use the extended observability matrix. Larimore does actually propose<br />
a generalization of the canonical variate analysis method <strong>and</strong><br />
<strong>in</strong>troduces a general weight<strong>in</strong>g matrix correspond<strong>in</strong>g to W <strong>in</strong> our<br />
notation.<br />
Compar<strong>in</strong>g (5.23) <strong>and</strong> (5.24) with (5.26) <strong>and</strong> (5.28), it is clear that<br />
if W = (YαΠ ⊥ αY T α) −1 , the two estimates are identical <strong>and</strong> equal to the
5.4 <strong>Subspace</strong> Estimation 131<br />
CVA estimate. This means that AML is a special case of WLS, <strong>and</strong> it<br />
suffices to consider WLS <strong>in</strong> the sequel.<br />
In [VD94a], an <strong>in</strong>terpretation of W as a shap<strong>in</strong>g filter <strong>in</strong> the frequency<br />
doma<strong>in</strong> is given. More precisely, W conta<strong>in</strong>s impulse response coefficients<br />
of a shap<strong>in</strong>g filter specify<strong>in</strong>g important frequency b<strong>and</strong>s. These ideas are<br />
further extended <strong>in</strong> [VD95a].<br />
A further observation on the selection of the weight<strong>in</strong>g matrix, W,<br />
concerns the model order selection. The estimate of the model order<br />
is often obta<strong>in</strong>ed by study<strong>in</strong>g the s<strong>in</strong>gular values of (5.23). W clearly<br />
affects the s<strong>in</strong>gular value distribution. Thus, it would be nice if W could<br />
<strong>in</strong>crease the gap between the n dom<strong>in</strong>at<strong>in</strong>g s<strong>in</strong>gular values <strong>and</strong> the rest.<br />
In [Lar94], it is claimed by a numerical example that CVA is optimal<br />
(ML) for model order selection.<br />
A popular subspace identification algorithm is the N4SID method<br />
[VD94b]. Hence, a short note on the relation of the l<strong>in</strong>ear regression<br />
framework to the estimate <strong>in</strong> N4SID is warranted. The N4SID estimate<br />
of Γα can not be obta<strong>in</strong>ed directly from our WLS solution because of<br />
the rank constra<strong>in</strong>t on ¯ L. In N4SID, this constra<strong>in</strong>t is <strong>in</strong>voked <strong>in</strong> a later<br />
stage. This may be seen as follows. Consider the unconstra<strong>in</strong>ed least<br />
squares estimate of ¯ L <strong>and</strong> L3 from the cost-function Tr{Θ}<br />
ˆ¯L = YαΠ ⊥ αP T �<br />
β PβΠ ⊥ αP T β<br />
ˆL3 = YαΠ ⊥ Pβ UT α<br />
� −1<br />
�<br />
UαΠ ⊥ PβUT �−1 α<br />
, (5.29a)<br />
, (5.29b)<br />
where Π ⊥ Pβ = I − PT β (PβP T β )−1 Pβ is the projection onto the nullspace<br />
of Pβ. Def<strong>in</strong>e the least squares residuals by<br />
Êα = Yα − ˆ¯ LPβ − ˆ L3Uα,<br />
where Êα is orthogonal to Pβ <strong>and</strong> Uα. Rewrite the true residuals as<br />
Eα = Yα − ¯ LPβ − L3Uα = Êα − � L¯ − Lˆ¯ �<br />
Pβ − � L3 − ˆ �<br />
L3 Uα .<br />
Due to the orthogonality, the cost-function reads<br />
Tr{Θ} = Tr{EαE T α} = � �<br />
�Eα�2<br />
F<br />
= � �<br />
�Êα � 2<br />
F +<br />
�<br />
�<br />
� � L¯ − Lˆ¯ �<br />
Pβ + � L3 − ˆ L3<br />
� Uα<br />
�<br />
�<br />
,<br />
� 2<br />
F
132 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
where � �· � � F denotes Frobenius norm. Hence if L3 is taken as the (unconstra<strong>in</strong>ed)<br />
least-squares estimate <strong>in</strong> (5.29b) <strong>and</strong> we m<strong>in</strong>imize under the<br />
constra<strong>in</strong>t that the rank of ¯ LPβ should be n, the approximation is ob-<br />
ta<strong>in</strong>ed from the SVD of ˆ¯ LPβ which is exactly what is used <strong>in</strong> N4SID.<br />
Clearly the difference between N4SID <strong>and</strong> our derivation is at which stage<br />
the rank constra<strong>in</strong>t is used. It is expected that the estimate should be<br />
more accurate if the constra<strong>in</strong>t is <strong>in</strong>voked as early as possible.<br />
5.5 Pole Estimation<br />
Given an estimate of the extended observability matrix Γα, different<br />
approaches for deriv<strong>in</strong>g the system matrices exist, see [Kun78, Ver94,<br />
McK94b, VD94b, Van95]. This chapter focuses on the estimation of the<br />
poles. The reasons for this are two-fold:<br />
• For compar<strong>in</strong>g the quality of the estimates from different methods,<br />
we need to consider a canonical parameter set. The perhaps<br />
simplest <strong>and</strong> most useful set is the system poles.<br />
• Initial work on the performance of 4SID methods has been reported<br />
<strong>in</strong> [VOWL91, VOWL93, OV94], <strong>and</strong> the results presented there<strong>in</strong><br />
rely on the pole estimation error. We will heavily draw from their<br />
results.<br />
In this section, we <strong>in</strong>troduce two pole estimation methods appear<strong>in</strong>g <strong>in</strong><br />
the literature. The asymptotic (<strong>in</strong> number of data samples N) properties<br />
of these two methods are then analyzed <strong>in</strong> Section 5.6.<br />
5.5.1 The Shift Invariance Method<br />
For deriv<strong>in</strong>g the system matrix, A, a straightforward method is to use<br />
the shift <strong>in</strong>variance structure of Γα, [Kun78, HK66]. Def<strong>in</strong>e the selection<br />
matrices<br />
J1 = � �<br />
I (α−1)l 0 (α−1)l×l ,<br />
J2 = � �<br />
0 (α−1)l×l I (α−1)l ,<br />
where In denotes the n × n identity matrix <strong>and</strong> 0n×m denotes an n × m<br />
matrix of zeros. Then the estimate of the system matrix is<br />
� � † � �<br />
 =<br />
, (5.30)<br />
J1 ˆ Γα<br />
J2 ˆ Γα
5.5 Pole Estimation 133<br />
where (·) † denotes the Moore-Penrose pseudo <strong>in</strong>verse of (·). Let µ be a<br />
vector with components µk, k = 1,..., n, where µk denotes the kth pole<br />
of the true system. The estimated poles are then, of course, given by the<br />
eigenvalues of Â, i.e.,<br />
5.5.2 <strong>Subspace</strong> Fitt<strong>in</strong>g<br />
ˆµ = the eigenvalues of  . (5.31)<br />
The second pole estimation technique considered was proposed <strong>in</strong> [OV94]<br />
(see also [SROK95]). Let Γα(θ) denote the parameterized observability<br />
matrix of some realization. The estimated observability matrix, ˆ Γα, may<br />
then be written<br />
ˆΓα = Γα(θ)T + ˜ Γα, (5.32)<br />
where T denotes an unknown similarity transformation matrix <strong>and</strong> ˜ Γα<br />
is the error. The idea <strong>in</strong> the subspace fitt<strong>in</strong>g approach is to obta<strong>in</strong> parameter<br />
estimates so as to achieve the best fit between the two subspaces<br />
spanned by the columns of Γα(θ) <strong>and</strong> ˆ Γα. To be more specific, the<br />
unknown parameters (θ <strong>and</strong> the elements <strong>in</strong> T) can be estimated by<br />
m<strong>in</strong>imiz<strong>in</strong>g some suitable norm of ˜ Γα, for example,<br />
�vec( ˆ Γα − Γα(θ)T)� 2 Φ . (5.33)<br />
Here, vec(A) denotes a vector obta<strong>in</strong>ed by stack<strong>in</strong>g the columns of A,<br />
Φ is a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix, <strong>and</strong> �x� 2 Φ = xT Φx. In [OV94]<br />
it was shown that, for m<strong>in</strong>imum variance parameter estimates (of θ), we<br />
may equivalently study the follow<strong>in</strong>g m<strong>in</strong>imization problem<br />
where V (θ) is def<strong>in</strong>ed by 2<br />
ˆθ = arg m<strong>in</strong><br />
θ V (θ), (5.34)<br />
V (θ) = vec T (Π ⊥<br />
W1/2 ˆQs)Λ vec(Π Γα<br />
⊥<br />
W1/2 ˆQs) (5.35)<br />
Γα<br />
for some symmetric positive semi-def<strong>in</strong>ite weight<strong>in</strong>g matrix Λ. Here<br />
Π ⊥<br />
W 1/2 Γα = I − W1/2 Γα(Γ ∗ αWΓα) −1 Γ ∗ αW 1/2<br />
(5.36)<br />
2 We use the notation V (·) to denote different cost-functions; we believe that this<br />
will not cause any confusion.
134 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
is a projection matrix on the null space of (W 1/2 Γα) ∗ , <strong>and</strong> (·) ∗ denotes<br />
the complex conjugate transpose. The argument (θ) has been suppressed<br />
for notational convenience.<br />
Remark 5.3. A few comments about the formulation <strong>in</strong> (5.34) follow. The<br />
cost-function (5.35) can be formulated <strong>in</strong> different ways lead<strong>in</strong>g to similar<br />
(or identical) results. The choice made here<strong>in</strong> differs slightly from [OV94].<br />
Instead of Π ⊥<br />
W1/2 ˆQs, we could equally well consider Π Γα<br />
⊥ ΓαW−1/2Qs ˆ or<br />
P T W −1/2 ˆ Qs where the columns of P are a parameterized basis for the<br />
null space of Γ ∗ α(θ); see [OV94, VWO97]. See Section 5.6.3 for further<br />
comments.<br />
Regard<strong>in</strong>g the parameterization, Γα(θ) is considered to be parameterized<br />
<strong>in</strong> any way such that the true d-dimensional parameter vector,<br />
θ0, is uniquely given by the identity, Π ⊥<br />
W1/2Γα (θ0)Qs = 0. In particular,<br />
Γα <strong>in</strong> the diagonal realization can be parameterized by the system<br />
poles, <strong>and</strong> for multi-output systems, possibly some of the coefficients <strong>in</strong><br />
C, see [OV94]. Notice that the parameterization should be such that the<br />
projection matrix <strong>in</strong> (5.36) is real valued. We also refer to [VWO97] for<br />
an <strong>in</strong>terest<strong>in</strong>g discussion about the parameterization issue. There<strong>in</strong>, it<br />
is shown that the null space of Γ ∗ α can be l<strong>in</strong>early parameterized which<br />
implies that the estimator (5.34) can be implemented <strong>in</strong> a non-iterative<br />
(two-step) fashion (see [OV94, VWO97]).<br />
5.6 Performance Analysis<br />
In this section, the asymptotic performance of the two pole estimation<br />
methods described <strong>in</strong> Section 5.5 is analyzed. A short discussion on<br />
consistency is made whereafter the asymptotic distribution of the pole<br />
estimates is derived.<br />
5.6.1 Consistency<br />
Consider the subspace estimate obta<strong>in</strong>ed through the WLS m<strong>in</strong>imization<br />
<strong>in</strong> Section 5.4.1 (as mentioned previously, this covers the AML case, as<br />
well). Thus, we assume the estimate of the extended observability matrix<br />
to be given by<br />
ˆΓα = W −1/2 ˆ Qs<br />
(5.37)
5.6 Performance Analysis 135<br />
where ˆ Qs is obta<strong>in</strong>ed from the SVD (cf. (5.23))<br />
ˆQs ˆ Ss ˆ V T s + ˆ Qn ˆ Sn ˆ V T n = W 1/2 YαΠ ⊥ αP T �<br />
β PβΠ ⊥ αP T β<br />
� −1/2<br />
. (5.38)<br />
In the asymptotic analysis, we consider W to be a fix full rank matrix or,<br />
<strong>in</strong> the data dependent case, a consistent estimate of a full rank matrix.<br />
Let Γ • α denote the extended observability matrix of a fix realization of<br />
(5.1) <strong>and</strong> let<br />
X = [x(1 + β) x(2 + β) ... x(N − α + 1)] .<br />
be the correspond<strong>in</strong>g state trajectory. In the analysis of the shift <strong>in</strong>variance<br />
method we let Γ • α be the extended observability matrix of the<br />
diagonal realization (Γ • α = Γ d α), whereas Γ • α = Γα(θ0) is the natural<br />
choice <strong>in</strong> the analysis of the subspace fitt<strong>in</strong>g method. Next notice that<br />
the block Hankel matrix of outputs from a fixed realization of (5.1) is<br />
given by (cf. (5.3))<br />
Yα = Γ • αX + ΦαUα + Nα<br />
(5.39)<br />
where Nα is def<strong>in</strong>ed similar to (5.12b). By assumption, the noise is<br />
uncorrelated with the past <strong>in</strong>puts <strong>and</strong> outputs <strong>and</strong> with the future <strong>in</strong>puts.<br />
By ergodicity, this implies that<br />
lim<br />
N→∞ NαU T α = 0 , (5.40)<br />
lim<br />
N→∞ NαP T β = 0 , (5.41)<br />
where the limits are <strong>in</strong>terpreted <strong>in</strong> terms of convergence w.p.1. Observe<br />
that all signals are assumed to be properly scaled such that multiplications<br />
as <strong>in</strong> (5.41) converge to the correspond<strong>in</strong>g covariance. Us<strong>in</strong>g these<br />
observations, the follow<strong>in</strong>g limit<br />
lim<br />
N→∞ W1/2YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 = W 1/2 Γ • αR •<br />
(5.42)<br />
is obta<strong>in</strong>ed (w.p.1) where, for notational convenience, we have def<strong>in</strong>ed<br />
R • = [Rxp(β) − Rxu(β)R −1<br />
u Rup][Rp − R T upR −1<br />
u Rup] −1/2<br />
<strong>and</strong> where<br />
Rxp(τ) = Ē� x(t + τ)p T β (t) � ,<br />
Rxu(τ) = Ē� x(t + τ)u T α(t + β) � ,<br />
Ru = Ē� uα(t)u T α(t) � ,<br />
Rup = Ē� uα(t + β)p T β (t) � .<br />
(5.43)
136 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
Notice that the covariances <strong>in</strong>volv<strong>in</strong>g system states are well def<strong>in</strong>ed s<strong>in</strong>ce<br />
the realization is fixed <strong>in</strong> the analysis. The notation R • is <strong>in</strong> correspondence<br />
with the notation for Γ • α (Γ d α ↔ R d , Γα(θ0) ↔ R θ0 ). Let<br />
QsSsV T s be the SVD of the right h<strong>and</strong> side of (5.42) or, equivalently,<br />
the limit of the SVD <strong>in</strong> (5.38). Assum<strong>in</strong>g the matrix R • <strong>in</strong> (5.42) has<br />
full rank equal to n, it is clear that the range space of Qs ( ˆ Qs → Qs as<br />
N → ∞) is a consistent estimate of the range space of W 1/2 Γ • α. Thus,<br />
(5.37) is a consistent estimate of the extended observability matrix of<br />
some realization. Any reasonable choice of estimator for C <strong>and</strong> A would<br />
then provide consistent estimates. In particular, the two pole estimation<br />
methods <strong>in</strong>troduced <strong>in</strong> Section 5.5 will give consistent estimates. Observe<br />
that the consistency result <strong>in</strong> [PSD95] does not apply here s<strong>in</strong>ce we consider<br />
a fix β. For a more complete analysis along the l<strong>in</strong>es of this section,<br />
see Chapter 6.<br />
5.6.2 Analysis of the Shift Invariance Method<br />
In this section, the asymptotic properties of the pole estimates from the<br />
shift <strong>in</strong>variance method are analyzed. These results are a straightforward<br />
extension of the analysis by [VOWL91, VOWL93] (see also [SS91, RH89]).<br />
The modification is needed because the subspace estimate is different <strong>and</strong><br />
because a weight<strong>in</strong>g (W) is <strong>in</strong>cluded. A direct application of the shift<br />
<strong>in</strong>variance method can, <strong>in</strong> some cases, lead to a quite strange behavior of<br />
the accuracy of the estimates as function of the dimension parameter α<br />
(see [WJ94]). However <strong>in</strong> [WJ94], the weight<strong>in</strong>g matrix, W, was not <strong>in</strong>cluded<br />
<strong>in</strong> the analysis. It would therefore be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate the<br />
effect of the weight<strong>in</strong>g on the pole estimation error. It could be conjectured<br />
that a proper choice of such a weight<strong>in</strong>g (for <strong>in</strong>stance, a weight<strong>in</strong>g<br />
that makes the residuals white) would improve the accuracy of the pole<br />
estimates. However, a quite surpris<strong>in</strong>g result obta<strong>in</strong>ed below shows that<br />
this is not the case; the weight<strong>in</strong>g does not affect the asymptotic pole<br />
estimation error.<br />
In order to state the results of this section, we need the follow<strong>in</strong>g
5.6 Performance Analysis 137<br />
notation:<br />
˜µk = ˆµk − µk ,<br />
Rnα (τ) = Ē� nα(t + τ)n T α(t) � ,<br />
ζ(t) = [y T β (t) u T β (t) u T α(t + β)] T = [p T β (t) u T α(t + β)] T ,<br />
Rζ(τ) = Ē<br />
�<br />
ζ(t + τ)ζ T �<br />
(t) ,<br />
f ∗ �<br />
k = J1Γ d � †<br />
α (J2 − J1µk) ,<br />
k•<br />
� �<br />
�Rp<br />
RH =<br />
− R<br />
(5.44)<br />
T upR −1<br />
�−1/2<br />
,<br />
I<br />
−R −1<br />
u Rup<br />
Ωd = RHR d†<br />
u Rup<br />
(5.45)<br />
where Ak• (A•k) denotes row (column) k of the matrix A. The notation<br />
(·) c will be used to denote complex conjugate.<br />
Theorem 5.1. Let the subspace estimate be given by (5.37)-(5.38) <strong>and</strong><br />
let the estimates of the system poles be as <strong>in</strong> (5.31). Then √ N ˜µk is<br />
asymptotically zero-mean Gaussian with second moments<br />
lim<br />
N→∞ N E {˜µk˜µ c l } = �<br />
[(Ωd) •k ] T Rζ(τ)[(Ωd) •l ] c f ∗ kRnα (τ)fl , (5.46a)<br />
|τ|
138 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
Remark 5.4. Corollary 5.2 is <strong>in</strong> fact true for a larger class of subspace estimates<br />
than considered <strong>in</strong> this chapter, e.g., the set of subspace estimates<br />
obta<strong>in</strong>ed from the unify<strong>in</strong>g theorem <strong>in</strong> [VD94a].<br />
Remark 5.5. <strong>On</strong>e should keep <strong>in</strong> m<strong>in</strong>d that Corollary 5.2 holds for large<br />
data records. For f<strong>in</strong>ite data, the weight<strong>in</strong>g may <strong>in</strong>deed improve the<br />
estimates. Furthermore, the weight<strong>in</strong>g can be useful for other reasons,<br />
e.g., model order selection, f<strong>in</strong>ite sample bias shap<strong>in</strong>g, <strong>and</strong> for cop<strong>in</strong>g<br />
with un-modeled dynamics. We also rem<strong>in</strong>d the reader that the result<br />
only concerns the pole estimates.<br />
5.6.3 Analysis of the <strong>Subspace</strong> Fitt<strong>in</strong>g Method<br />
In this section, we analyze the asymptotic properties of the parameter<br />
estimates from the subspace fitt<strong>in</strong>g method (5.34). In [OV94], this estimation<br />
problem is solved by <strong>in</strong>vok<strong>in</strong>g the general theory of asymptotically<br />
best consistent estimation of nonl<strong>in</strong>ear regression parameters (for<br />
<strong>in</strong>stance, see [SS89]). Furthermore, the projection matrix is reparameterized<br />
<strong>in</strong> the coefficients of the characteristic polynomial of A <strong>and</strong> a<br />
non-iterative procedure for solv<strong>in</strong>g (5.34) is described. To some extent,<br />
the results of the analysis below will be parallel to the one <strong>in</strong> [OV94].<br />
Specifically, the optimal weighted method derived below is the same as<br />
<strong>in</strong> [OV94] though phrased slightly different <strong>and</strong> tailored to our subspace<br />
estimate, (5.38)-(5.37), with the weight W <strong>in</strong>cluded. Our derivation also<br />
provides an expression for the parameter estimation error covariance with<br />
a general weight<strong>in</strong>g matrix, not just for the optimal weight<strong>in</strong>g. This will<br />
make it possible to compare different weight<strong>in</strong>gs. Moreover, the results<br />
concern parameters <strong>in</strong> the extended observability matrix explicitly. This<br />
makes it possible to directly apply the result to the pole estimation error<br />
(see Section 5.7).<br />
In Appendix A.12, the asymptotic distribution of the estimates from<br />
(5.34) is derived through a st<strong>and</strong>ard Taylor series expansion approach<br />
[Lju87, SS89]. Before present<strong>in</strong>g the ma<strong>in</strong> results of this analysis we
5.6 Performance Analysis 139<br />
def<strong>in</strong>e<br />
G = ��<br />
(W 1/2 Γα) † �T Qs ⊗ Π ⊥<br />
W1/2ΓαW1/2� × � vec(Γα1 ) vec(Γα2 ) ... vec(Γαd )� , (5.47)<br />
Σ = � � T<br />
H Rζ(τ)H � ⊗ � Π ⊥<br />
�<br />
,<br />
|τ|
140 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
as<br />
lim<br />
N→∞ N E�θ˜ θ˜ T� T † −1<br />
= (G Σ G)<br />
�<br />
= [vec(Γα1 ) vec(Γα2 ) · · · vec(Γαd )]T<br />
�<br />
�<br />
×<br />
|τ|
5.6 Performance Analysis 141<br />
Remark 5.8. As the optimally weighted subspace fitt<strong>in</strong>g method has been<br />
formulated here, it is not particularly well suited for implementation.<br />
We refer to [OV94, VWO97] for further comments <strong>and</strong> details about the<br />
implementation.<br />
In the analysis above, it was assumed that a subspace estimate was<br />
given <strong>in</strong> which the parameters were estimated <strong>in</strong> a second step. <strong>On</strong>e could<br />
ask why not <strong>in</strong>troduce the parameterization of Γα directly <strong>in</strong> (5.14) before<br />
a specific subspace estimate is derived? The rest of this section will be<br />
concerned with the analysis of the accuracy of the so-obta<strong>in</strong>ed estimates.<br />
The derived results also lead to an <strong>in</strong>terest<strong>in</strong>g connection to a previously<br />
published 4SID parameter estimation algorithm [SROK95] (see Remark<br />
5.10).<br />
If the cost-function (5.14) is m<strong>in</strong>imized with respect to L3 <strong>and</strong> then<br />
with respect to ¯ K, the follow<strong>in</strong>g estimate is obta<strong>in</strong>ed<br />
ˆ¯K = (Γ T αWΓα) −1 Γ T αWYαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 .<br />
The concentrated cost-function, omitt<strong>in</strong>g terms <strong>in</strong>dependent of Γα(θ),<br />
reads<br />
�<br />
V (θ) = Tr Π ⊥<br />
W1/2ΓαW1/2YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 PβΠ ⊥ αY T αW 1/2�<br />
.<br />
Consider the parameter estimates obta<strong>in</strong>ed through<br />
(5.54)<br />
ˆθ1 = arg m<strong>in</strong><br />
θ V (θ) . (5.55)<br />
An <strong>in</strong>terest<strong>in</strong>g question is how this estimate is related to the one from<br />
(5.34). This can be answered (asymptotically) by us<strong>in</strong>g the follow<strong>in</strong>g<br />
result.<br />
Theorem 5.5. The estimator (5.55) is asymptotically equivalent to<br />
ˆθ2 = arg m<strong>in</strong> Tr{Π<br />
θ ⊥<br />
W1/2 ˆQsS Γα<br />
2 s ˆ Q T s } , (5.56)<br />
i.e., √ N( ˆ θ1 − ˆ θ2) → 0 <strong>in</strong> probability as N → ∞, imply<strong>in</strong>g that ˆ θ1 <strong>and</strong><br />
ˆθ2 have the same asymptotic distribution.<br />
Proof. The proof is given <strong>in</strong> Appendix A.13.
142 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
Now, it is possible to rewrite the cost-function <strong>in</strong> (5.56) to yield<br />
vec T (Π ⊥<br />
W1/2 ˆQs)(S Γα<br />
2 s ⊗ I)vec(Π ⊥<br />
W1/2 ˆQs) (5.57)<br />
Γα<br />
where the facts that Tr{AB} = vec T (A T )vec(B) <strong>and</strong> (A.76) have been<br />
used. Compar<strong>in</strong>g (5.57) <strong>and</strong> (5.35), it is obvious that (5.57) is a special<br />
case with<br />
The follow<strong>in</strong>g theorem is then immediate.<br />
Λ = S 2 s ⊗ I . (5.58)<br />
Theorem 5.6. The estimator <strong>in</strong> (5.55) yields less accurate estimates<br />
than us<strong>in</strong>g (5.34) with the optimal weight<strong>in</strong>g (5.52).<br />
Remark 5.9. The performance comparison <strong>in</strong> Theorem 5.6 is not completely<br />
proper s<strong>in</strong>ce the optimally weighted estimator is considerably<br />
more <strong>in</strong>volved (the norms are different). Of course, if we decide to estimate<br />
the parameters directly from (5.11), we can do better than (5.55).<br />
However, it is <strong>in</strong>terest<strong>in</strong>g that by this we have an expression for the<br />
asymptotic covariance for the estimates from (5.55).<br />
Remark 5.10. The last two results should also be compared with the subspace<br />
formulation <strong>in</strong> [SROK95]. They consider a weighted subspace fitt<strong>in</strong>g<br />
method applied to another subspace estimate than used here. However,<br />
if their approach is adopted on the estimate here<strong>in</strong>, the estimator<br />
would be (5.56), <strong>and</strong> the performance given by (5.51) with the weight<br />
(5.58).<br />
Remark 5.11. It is shown <strong>in</strong> [Jan95] that the estimator (5.56) also has<br />
the same asymptotic performance as if the term τ = 0 <strong>in</strong> (5.52) is only<br />
used as weight <strong>in</strong> (5.34) <strong>and</strong> W = (YαΠ ⊥ αY T α) −1 .<br />
Remark 5.12. Our experience is that the accuracy of the estimates from<br />
(5.56) <strong>in</strong> general is the same or worse than what is obta<strong>in</strong>ed by the<br />
shift <strong>in</strong>variance method. This is also the reason why this method is not<br />
<strong>in</strong>cluded <strong>in</strong> the numerical examples <strong>in</strong> Section 5.7.<br />
5.7 Examples<br />
In this section, we provide two examples <strong>in</strong> order to compare the shift<br />
<strong>in</strong>variance <strong>and</strong> the optimally weighted subspace fitt<strong>in</strong>g method.
5.7 Examples 143<br />
Example 5.1. As briefly mentioned <strong>in</strong> Section 5.5, a rather strange behavior<br />
of the pole estimation accuracy as a function of the prediction<br />
horizon parameter α has been noted <strong>in</strong> [WJ94]. There<strong>in</strong>, the system<br />
poles were estimated by the shift <strong>in</strong>variance method, i.e., by (5.31). It<br />
was noted that the accuracy does not necessarily <strong>in</strong>crease monotonically<br />
with a larger α but it can, <strong>in</strong> fact, even decrease. With the example below,<br />
the importance of proper weight<strong>in</strong>g when recover<strong>in</strong>g the poles from<br />
the subspace estimate is clearly visible. The theoretical expression for<br />
the estimation error covariance for the weighted subspace fitt<strong>in</strong>g estimator<br />
(5.53) will be compared with the correspond<strong>in</strong>g result for the shift<br />
<strong>in</strong>variance method (5.46).<br />
Consider the same example as <strong>in</strong> [WJ94, JW95]. The true system is<br />
described by a first-order state-space model<br />
x(t + 1) = 0.7x(t) + u(t) + w(t) , (5.59a)<br />
y(t) = x(t) + v(t) . (5.59b)<br />
The measurable <strong>in</strong>put, u(t), is a realization of a white Gaussian process<br />
with variance ru = 1. The process noise, w(t), <strong>and</strong> the measurement<br />
noise, v(t), are <strong>in</strong>dividually <strong>in</strong>dependent white Gaussian processes <strong>and</strong><br />
uncorrelated with u(t). The variance of v(t) is fixed to rv = 1 while the<br />
variance of the process noise will take on two different values, namely<br />
rw = 0.2 <strong>and</strong> rw = 2. The first choice makes the system to be approximately<br />
of output error type while the second choice makes the system<br />
approximately ARX.<br />
In order to use (5.34), a parameterization of Γα has to be chosen. If<br />
C is fixed to 1, Γα takes the form<br />
Γα = � 1 λ λ 2 · · · λ α−1� T ,<br />
where λ is the pole to be estimated. For details about the implementation<br />
of the subspace fitt<strong>in</strong>g method, we refer to [OV94].<br />
The theoretical expressions (5.46) <strong>and</strong> (5.53) will be compared for<br />
different variance of the process noise <strong>and</strong> different comb<strong>in</strong>ations of α<br />
<strong>and</strong> β. Fig. 5.1 shows the normalized (by √ N) asymptotic st<strong>and</strong>ard deviation<br />
of the pole estimate for the shift <strong>in</strong>variance method. The two<br />
subplots correspond to a different variance of the process noise. Next,<br />
the normalized asymptotic st<strong>and</strong>ard deviation of the pole estimate from<br />
(5.53) is plotted <strong>in</strong> Fig. 5.2. It is seen that the degradation <strong>in</strong> accuracy<br />
when <strong>in</strong>creas<strong>in</strong>g α for high process noise variance <strong>in</strong> Fig. 5.1, is overcome<br />
<strong>in</strong> the optimal weight<strong>in</strong>g case, Fig. 5.2. The shift <strong>in</strong>variance method does
144 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
0.75<br />
0.7<br />
0.65<br />
0.6<br />
0.55<br />
0.5<br />
2<br />
0.94<br />
0.93<br />
0.92<br />
0.91<br />
0.9<br />
0.89<br />
0.88<br />
0.87<br />
0.86<br />
0.85<br />
0.84<br />
2<br />
3<br />
3<br />
4<br />
4<br />
α<br />
5<br />
α<br />
Figure 5.1: The basic shift <strong>in</strong>variance approach. Normalized asymptotic<br />
st<strong>and</strong>ard deviation of the estimated pole as a function of α <strong>and</strong> β. In the<br />
first plot, the variance of the process noise is rw = 0.2. In the second, it<br />
is rw = 2. For comparison, the Cramér-Rao lower bounds are 0.506 <strong>and</strong><br />
0.830, respectively.<br />
5<br />
6<br />
1<br />
6<br />
2<br />
1<br />
3<br />
2<br />
β<br />
3<br />
4<br />
4<br />
β<br />
5<br />
5<br />
6
5.7 Examples 145<br />
0.75<br />
0.7<br />
0.65<br />
0.6<br />
0.55<br />
0.5<br />
2<br />
0.88<br />
0.87<br />
0.86<br />
0.85<br />
0.84<br />
0.83<br />
0.82<br />
2<br />
3<br />
3<br />
4<br />
4<br />
α<br />
α<br />
5<br />
5<br />
Figure 5.2: The optimally weighted subspace fitt<strong>in</strong>g approach. Normalized<br />
asymptotic st<strong>and</strong>ard deviation of the estimated pole as a function of<br />
α <strong>and</strong> β. In the first plot, the variance of the process noise is rw = 0.2.<br />
In the second, it is rw = 2. The Cramér-Rao lower bounds are 0.506 <strong>and</strong><br />
0.830, respectively.<br />
6<br />
6<br />
1<br />
1<br />
2<br />
2<br />
3<br />
3<br />
β<br />
β<br />
4<br />
4<br />
5<br />
5
146 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
not have any <strong>in</strong>herent weight<strong>in</strong>g compensat<strong>in</strong>g for bad scal<strong>in</strong>g, <strong>and</strong> thus<br />
it is impossible to give any rule for the choice of the prediction horizon α.<br />
In the weighted subspace fitt<strong>in</strong>g method, the weight<strong>in</strong>g assures a more<br />
robust or consistent behavior. The performance clearly improves with<br />
α, but the computational burden <strong>in</strong>creases, <strong>and</strong> a trade-off is necessary.<br />
For the choice of the observer dimension, β, it is observed that the performance<br />
is quite <strong>in</strong>dependent of β. Thus, a choice close to the system<br />
order would be recommended s<strong>in</strong>ce the computational cost <strong>in</strong>creases with<br />
β. The choice of variance for the <strong>in</strong>put is of no major importance for the<br />
po<strong>in</strong>t to be made; this value determ<strong>in</strong>es (approximatively) the level of<br />
the curves, not the shape. The shapes of the curves are determ<strong>in</strong>ed by<br />
the relation between process <strong>and</strong> measurement noise <strong>in</strong> comb<strong>in</strong>ation with<br />
how well damped the system is.<br />
For the example, it is possible to calculate a lower bound on the<br />
variance of the estimation error for any unbiased estimator of the pole,<br />
the Cramér-Rao lower Bound (CRB). This may be done by convert<strong>in</strong>g<br />
the system (5.59) to a first-order ARMAX system with identical secondorder<br />
properties as that of the model (5.59). The CRB for the parameters<br />
<strong>in</strong> this ARMAX model is then readily calculated (see, e.g., [SS89]). For<br />
this example, the bounds on the pole estimation error st<strong>and</strong>ard deviation<br />
were calculated to 0.506 <strong>and</strong> 0.830 for the low <strong>and</strong> high process noise,<br />
respectively. An <strong>in</strong>terest<strong>in</strong>g observation is that the CRB seems to be<br />
atta<strong>in</strong>ed by the method which uses the complete structure of Γα for<br />
large α <strong>and</strong> β.<br />
Example 5.2. For white <strong>in</strong>put signals, the pole estimators based on ˆ Γα<br />
from either N4SID or PO-MOESP give the same asymptotic accuracy.<br />
This equivalence does not hold true for colored <strong>in</strong>put signals. In this<br />
example, we study the difference between the subspace estimates ( ˆ Γα)<br />
from N4SID <strong>and</strong> PO-MOESP by compar<strong>in</strong>g the asymptotic pole estimation<br />
error variance. The expressions for the variance <strong>in</strong> the N4SID case<br />
are given <strong>in</strong> [VOWL93] for the shift <strong>in</strong>variance method <strong>and</strong> <strong>in</strong> [OV94] for<br />
the weighted subspace fitt<strong>in</strong>g method. (Observe that the N4SID subspace<br />
estimate is equivalent to the <strong>in</strong>strumental variable subspace estimate <strong>in</strong><br />
[VOWL93, OV94], see [VD94a].)<br />
Let the system be as <strong>in</strong> Example 5.1. We consider the low process<br />
noise scenario, i.e., rv = 1 <strong>and</strong> rw = 0.2. The difference <strong>in</strong> this example<br />
is that the <strong>in</strong>put is a colored signal. More precisely, the <strong>in</strong>put signal u(t)
5.8 Conclusions 147<br />
is generated as<br />
u(t) = 0.3 q2 + 2q + 1<br />
q2 η(t) + ν(t)<br />
− 0.17<br />
where q is the forward shift operator. The zero-mean white noise processes<br />
η(t) <strong>and</strong> ν(t) are <strong>in</strong>dependent <strong>and</strong> have the variances rη = 10 <strong>and</strong><br />
rν = 10 −4 , respectively. Consider the pole estimates from the shift <strong>in</strong>variance<br />
<strong>and</strong> the optimally weighted subspace fitt<strong>in</strong>g method based on<br />
the PO-MOESP estimate of Γα. In Fig. 5.3, the normalized asymptotic<br />
st<strong>and</strong>ard deviations of the pole estimates are shown as a function of α<br />
<strong>and</strong> β. We see that the variances still are approximatively constant with<br />
respect to β. Also notice that the variance of the weighted subspace<br />
fitt<strong>in</strong>g estimates does not decay monotonically with <strong>in</strong>creas<strong>in</strong>g α.<br />
The correspond<strong>in</strong>g results for the pole estimates based on the N4SID<br />
estimate of Γα are given <strong>in</strong> Fig. 5.4. The performance is dramatically<br />
affected by the choice of α <strong>and</strong> β. To avoid misunderst<strong>and</strong><strong>in</strong>g it should<br />
be noted that the st<strong>and</strong>ard deviations <strong>in</strong> Fig. 5.4 are not what is obta<strong>in</strong>ed<br />
by the N4SID algorithm. In [VD94b], the system matrices <strong>and</strong> thus the<br />
poles are estimated <strong>in</strong> a completely different way. However, the results<br />
presented here <strong>in</strong>dicates that the N4SID estimate of Γα may not be the<br />
best choice, <strong>and</strong> this was also alluded to when the subspace estimation<br />
was discussed <strong>in</strong> Section 5.4.3. Monte Carlo type simulations <strong>in</strong>dicate<br />
that the N4SID algorithm [VD94b] has a performance similar to the shift<br />
<strong>in</strong>variance method based on the PO-MOESP subspace estimate.<br />
We may also compare the st<strong>and</strong>ard deviations <strong>in</strong> Fig. 5.3 <strong>and</strong> Fig. 5.4<br />
with the CRB. The CRB for this example was calculated to 0.176. Clearly<br />
all the surfaces are above this limit.<br />
In clos<strong>in</strong>g this example we rem<strong>in</strong>d the reader of Corollary 5.2 <strong>and</strong><br />
Corollary 5.4 which imply that the asymptotic accuracy of the pole estimates<br />
based on either the PO-MOESP or the CVA subspace estimate is<br />
identical.<br />
5.8 Conclusions<br />
We have used a l<strong>in</strong>ear regression formulation to l<strong>in</strong>k state-space subspace<br />
system identification (4SID) to traditional identification methods. This<br />
has been used to compare some cost-functions which arise implicitly <strong>in</strong><br />
the ord<strong>in</strong>ary framework of 4SID. From the cost-functions, estimates of
148 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
0.225<br />
0.22<br />
0.215<br />
0.21<br />
0.205<br />
0.2<br />
0.195<br />
0.19<br />
2<br />
0.225<br />
0.22<br />
0.215<br />
0.21<br />
0.205<br />
0.2<br />
0.195<br />
0.19<br />
2<br />
3<br />
3<br />
4<br />
4<br />
5<br />
5<br />
6<br />
α<br />
6<br />
α<br />
Figure 5.3: The normalized asymptotic st<strong>and</strong>ard deviation of the estimated<br />
pole as a function of α <strong>and</strong> β. The first plot is for the shift<br />
<strong>in</strong>variance method <strong>and</strong> the second for the optimally weighted subspace<br />
fitt<strong>in</strong>g method. The two pole estimation algorithms are based on the<br />
PO-MOESP estimate of Γα.<br />
7<br />
7<br />
8<br />
8<br />
1<br />
1<br />
2<br />
2<br />
3<br />
3<br />
4<br />
4<br />
β<br />
β<br />
5<br />
5<br />
6<br />
6<br />
7<br />
7
5.8 Conclusions 149<br />
0.6<br />
0.55<br />
0.5<br />
0.45<br />
0.4<br />
0.35<br />
0.3<br />
0.25<br />
0.2<br />
2<br />
0.5<br />
0.45<br />
0.4<br />
0.35<br />
0.3<br />
0.25<br />
0.2<br />
2<br />
3<br />
3<br />
4<br />
4<br />
5<br />
5<br />
6<br />
α<br />
6<br />
α<br />
7<br />
7<br />
Figure 5.4: The normalized asymptotic st<strong>and</strong>ard deviation of the estimated<br />
pole as a function of α <strong>and</strong> β. The first plot is for the shift<br />
<strong>in</strong>variance method <strong>and</strong> the second for the optimally weighted subspace<br />
fitt<strong>in</strong>g method. The two pole estimation algorithms are based on the<br />
N4SID estimate of Γα.<br />
8<br />
8<br />
2<br />
2<br />
4<br />
4<br />
β<br />
β<br />
6<br />
6<br />
8<br />
8
150 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />
the extended observability matrix have been derived <strong>and</strong> related to exist<strong>in</strong>g<br />
4SID methods. Based on these estimates, the estimation of certa<strong>in</strong><br />
parameters were analyzed. In particular the problem of estimat<strong>in</strong>g the<br />
poles was considered. The asymptotic properties of the estimates of two<br />
different methods were analyzed. The first method is common <strong>in</strong> 4SID<br />
<strong>and</strong> makes use of the shift <strong>in</strong>variance structure of the observability matrix,<br />
while the second method is a recently proposed weighted subspace<br />
fitt<strong>in</strong>g method. From these results, the choice of user-specified parameters<br />
was discussed, <strong>and</strong> it was shown that this choice <strong>in</strong>deed may not be<br />
obvious for the shift <strong>in</strong>variance method. This problem can be mitigated<br />
by us<strong>in</strong>g more of the structure of the system. It was also shown that a<br />
proposed row-weight<strong>in</strong>g matrix <strong>in</strong> the subspace estimation step does not<br />
affect the asymptotic properties of the estimates.
Chapter 6<br />
<strong>On</strong> Consistency of<br />
<strong>Subspace</strong> <strong>Methods</strong> for<br />
<strong>System</strong> <strong>Identification</strong><br />
<strong>Subspace</strong> methods for the identification of l<strong>in</strong>ear time-<strong>in</strong>variant dynamical<br />
systems typically consist of two ma<strong>in</strong> steps. First, a so-called subspace<br />
estimate is constructed. This first step usually consists of estimat<strong>in</strong>g the<br />
range space of the extended observability matrix. Secondly, an estimate<br />
of system parameters is obta<strong>in</strong>ed, based on the subspace estimate. In<br />
this chapter, the consistency of a large class of methods for estimat<strong>in</strong>g<br />
the extended observability matrix is analyzed. Persistence of excitation<br />
conditions on the <strong>in</strong>put signal are given which guarantee consistency for<br />
systems with only measurement noise. For systems <strong>in</strong>clud<strong>in</strong>g process<br />
noise, it is shown that a persistence of excitation condition on the <strong>in</strong>put<br />
is not sufficient. More precisely, an example for which the subspace methods<br />
fail to give a consistent estimate of the transfer function is given. This<br />
failure occurs even though the <strong>in</strong>put is persistently excit<strong>in</strong>g, to any order.<br />
It is also shown that this problem is elim<strong>in</strong>ated if stronger conditions are<br />
imposed on the <strong>in</strong>put signal.
152 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
6.1 Introduction<br />
The large <strong>in</strong>terest <strong>in</strong> subspace methods for system identification is motivated<br />
by the need for useful eng<strong>in</strong>eer<strong>in</strong>g tools to model l<strong>in</strong>ear multivariable<br />
dynamical systems [VD94b, Lar90, Ver94, Vib95]. These methods,<br />
sometimes referred to as 4SID (subspace state-space system identification)<br />
methods, are easy to apply to identification of multivariable systems.<br />
Reliable numerical algorithms form the basis of these methods.<br />
The statistical properties, however, are not yet fully understood. Initial<br />
work <strong>in</strong> this direction is exemplified by [VOWL91, PSD96, JW96a,<br />
VWO97, BDS97].<br />
Conditions for consistency of the methods <strong>in</strong> the literature have often<br />
<strong>in</strong>volved assumptions on the rank of certa<strong>in</strong> matrices used <strong>in</strong> the<br />
schemes. It would be more useful to have explicit conditions on the<br />
<strong>in</strong>put signals <strong>and</strong> on the system. The consistency of comb<strong>in</strong>ed determ<strong>in</strong>istic<br />
<strong>and</strong> stochastic subspace identification methods has been analyzed<br />
<strong>in</strong> [PSD96]. In that paper, consistency is proven under two assumptions.<br />
<strong>On</strong>e, is the assumption that the <strong>in</strong>put is an ARMA process. Secondly,<br />
it is assumed that a dimension parameter tends to <strong>in</strong>f<strong>in</strong>ity at a certa<strong>in</strong><br />
rate as the number of samples approaches <strong>in</strong>f<strong>in</strong>ity. This assumption is<br />
needed to consistently estimate the stochastic sub-system <strong>and</strong> the system<br />
order. <strong>Subspace</strong> methods are typically applied with fixed dimension<br />
parameters. A consistency analysis under these conditions is given <strong>in</strong><br />
[JW96b, JW97a]. The analysis is further extended <strong>in</strong> this chapter. The<br />
analysis focuses on the crucial first step of the subspace identification<br />
methods. That is, the focus is on estimat<strong>in</strong>g the extended observability<br />
matrix, or the “subspace estimation” step. The methods analyzed here<strong>in</strong>,<br />
are called Basic-4SID <strong>and</strong> IV-4SID.<br />
The Basic-4SID method [DVVM88, Liu92] has been analyzed <strong>in</strong><br />
[Gop69, VOWL91, Liu92, VD92]. Here<strong>in</strong>, we extend the analysis <strong>and</strong><br />
give persistence of excitation conditions on the <strong>in</strong>put signal. The more<br />
recent IV-4SID class of methods <strong>in</strong>clude N4SID [VD94b], MOESP [Ver94]<br />
<strong>and</strong> CVA [Lar90]. An open question regard<strong>in</strong>g the IV-4SID methods has<br />
been whether or not only a persistence of excitation condition on the<br />
<strong>in</strong>put is sufficient for consistency. The current contribution employs a<br />
counterexample to show that this is not the case. For systems with only<br />
measurement noise, however, it is possible to show that a persistently excit<strong>in</strong>g<br />
<strong>in</strong>put of a given order ensures consistency of the IV-4SID subspace<br />
estimate. For systems <strong>in</strong>clud<strong>in</strong>g process noise, IV-4SID methods may<br />
suffer from consistency problems. This is explicitly shown <strong>in</strong> the afore-
6.2 Prelim<strong>in</strong>aries 153<br />
mentioned counterexample. A simple proof shows that a white noise (or<br />
a low order ARMA) <strong>in</strong>put signal guarantees consistency <strong>in</strong> this case.<br />
The rest of the chapter is organized as follows. In Section 6.2, the<br />
system model is def<strong>in</strong>ed <strong>and</strong> general assumptions are stated. Some prelim<strong>in</strong>ary<br />
results are also collected <strong>in</strong> this section. In Section 6.3, the ideas<br />
of the Basic 4SID <strong>and</strong> IV-4SID methods are presented. In Section 6.4,<br />
the consistency of the Basic-4SID method is analyzed. In Section 6.5, the<br />
critical relation for consistency of IV-4SID methods is presented. Conditions<br />
for this relation to hold are given. In Section 6.6, the counterexample<br />
is given, describ<strong>in</strong>g a system for which the critical relation does<br />
not hold. In Section 6.7, some simulations are presented to illustrate the<br />
theoretical results. F<strong>in</strong>ally, conclusions are provided <strong>in</strong> Section 6.8.<br />
6.2 Prelim<strong>in</strong>aries<br />
Assume that the true system is l<strong>in</strong>ear, time-<strong>in</strong>variant <strong>and</strong> described by<br />
the follow<strong>in</strong>g state-space equations:<br />
x(t + 1) = Ax(t) + Bu(t) + w(t), (6.1a)<br />
y(t) = Cx(t) + Du(t) + e(t). (6.1b)<br />
Here, x(t) ∈ R n is the state-vector, u(t) ∈ R m consists of the observed<br />
<strong>in</strong>put signals, <strong>and</strong> y(t) ∈ R l conta<strong>in</strong>s the observed output signals. The<br />
system is perturbed by the process noise w(t) ∈ R n <strong>and</strong> the measurement<br />
noise e(t) ∈ R l .<br />
Let us <strong>in</strong>troduce the follow<strong>in</strong>g general assumptions:<br />
A1: The noise process [w T (t) e T (t)] T is a stationary, ergodic, zero-mean,<br />
white noise process with covariance matrix,<br />
� �� �T w(t) w(t)<br />
E{<br />
} =<br />
e(t) e(t)<br />
� rw rwe<br />
r T we re<br />
It will be convenient to <strong>in</strong>troduce an n×p matrix K such that rw =<br />
KK T , where p is the rank of rw. This implies that w(t) = Kv(t).<br />
Here, E{v(t)v T (t)} = Ip <strong>and</strong> Ip denotes a p × p identity matrix.<br />
A2: The <strong>in</strong>put u(t) is a quasi-stationary signal [Lju87]. The correlation<br />
sequence is def<strong>in</strong>ed by<br />
ru(τ) = Ē{u(t + τ)uT (t)}<br />
�<br />
.
154 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
where Ē is def<strong>in</strong>ed as<br />
1<br />
Ē{·} = lim<br />
N→∞ N<br />
N�<br />
E{·} .<br />
The <strong>in</strong>put u(t) <strong>and</strong> the noise process are assumed to be uncorrelated.<br />
A3: The system (6.1) is asymptotically stable. That is, the eigenvalues<br />
of A lie strictly <strong>in</strong>side the unit circle.<br />
A4: The description (6.1) is m<strong>in</strong>imal <strong>in</strong> the sense that the pair (A,C)<br />
is observable <strong>and</strong> the pair (A,[B K]) is reachable.<br />
The concept of persistently excit<strong>in</strong>g signals is def<strong>in</strong>ed <strong>in</strong> the follow<strong>in</strong>g<br />
way:<br />
Def<strong>in</strong>ition 6.1. A (quasi-stationary) signal u(t) is said to be persistently<br />
excit<strong>in</strong>g of order n, denoted by PE(n), if the follow<strong>in</strong>g matrix is<br />
positive def<strong>in</strong>ite;<br />
⎡<br />
ru(0)<br />
⎢ ru(−1)<br />
⎢ .<br />
⎣ .<br />
ru(1)<br />
ru(0)<br />
...<br />
. ..<br />
⎤<br />
ru(n − 1)<br />
ru(n − 2) ⎥<br />
.<br />
⎥<br />
. ⎦<br />
ru(1 − n) ru(2 − n) ... ru(0)<br />
.<br />
The follow<strong>in</strong>g two lemmas will be useful <strong>in</strong> the consistency analysis.<br />
Lemma 6.1. Consider the (block) matrix<br />
� �<br />
A B<br />
S = , (6.2)<br />
C D<br />
where A is n × n <strong>and</strong> D is a non-s<strong>in</strong>gular m × m matrix. Then<br />
t=1<br />
rank{S} = m + rank{A − BD −1 C}.<br />
Proof. The result follows directly from the identity,<br />
� �� � �<br />
−1 I −BD A B I 0<br />
0 I C D −D−1 � � �<br />
∆ 0<br />
= , (6.3)<br />
C I 0 D<br />
where ∆ = A − BD −1 C.<br />
Us<strong>in</strong>g (6.3), the next lemma is readily proven.<br />
Lemma 6.2. Let S be as <strong>in</strong> (6.2) but symmetric (C = B T ), <strong>and</strong> let D<br />
be positive def<strong>in</strong>ite. Conclude then, that S is positive (semi) def<strong>in</strong>ite if<br />
<strong>and</strong> only if (A − BD −1 B T ) is positive (semi) def<strong>in</strong>ite.
6.3 Background 155<br />
6.3 Background<br />
In this section, the basic equations used <strong>in</strong> subspace state-space system<br />
identification (4SID) methods are reviewed. Def<strong>in</strong>e the lα × 1 vector of<br />
stacked outputs as<br />
yα(t) =<br />
�<br />
y T (t),y T (t + 1),...,y T �T (t + α − 1)<br />
where α is a user-specified parameter which is required to be greater<br />
than the observability <strong>in</strong>dex (or, for simplicity, the system order n). In<br />
a similar manner, def<strong>in</strong>e vectors made of stacked <strong>in</strong>puts, process <strong>and</strong><br />
measurement noises as uα(t), wα(t) <strong>and</strong> eα(t), respectively. By iteration<br />
of the state equations <strong>in</strong> (6.1), it is straightforward to verify the follow<strong>in</strong>g<br />
equation for the stacked quantities;<br />
where<br />
yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (6.4)<br />
nα(t) = Ψαwα(t) + eα(t) ,<br />
<strong>and</strong> where the follow<strong>in</strong>g block matrices have been <strong>in</strong>troduced:<br />
⎡<br />
⎢<br />
Γα = ⎢<br />
⎣<br />
C<br />
CA<br />
.<br />
CAα−1 ⎤<br />
⎥<br />
⎦ ,<br />
⎡<br />
D 0 · · · 0<br />
⎢<br />
Φα = ⎢<br />
⎣<br />
CB<br />
.<br />
D<br />
. ..<br />
. ..<br />
. ..<br />
0<br />
.<br />
CAα−2B CAα−3 ⎤<br />
⎥<br />
⎦<br />
B · · · D<br />
,<br />
⎡<br />
0 0 · · · 0<br />
⎢<br />
Ψα = ⎢<br />
⎣<br />
C<br />
.<br />
0<br />
. ..<br />
. ..<br />
. ..<br />
0<br />
.<br />
CAα−2 CAα−3 ⎤<br />
⎥<br />
⎦<br />
· · · 0<br />
.<br />
The key idea of subspace identification methods is to first estimate the ndimensional<br />
range space of the (extended) observability matrix Γα. Based<br />
,
156 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
on the first step, the system parameters are subsequently estimated <strong>in</strong><br />
various ways. For consistency of 4SID methods it is crucial that the<br />
estimate of Γα is consistent.<br />
There have been several suggestions <strong>in</strong> the literature for estimat<strong>in</strong>g<br />
Γα. Most of them, however, are closely related to one another<br />
[VD95c, Vib95]. To describe the methods considered, additional notation<br />
is needed. Assume there are N + α + β − 1 data samples available<br />
<strong>and</strong> <strong>in</strong>troduce the block Hankel matrices<br />
Yα = [yα(1 + β),...,yα(N + β)], (6.5)<br />
Yβ = [yβ(1),...,yβ(N)]. (6.6)<br />
The parameter β is to be chosen by the user. Intuitively, β can be<br />
<strong>in</strong>terpreted as the dimension of the observer used to estimate the current<br />
state [WJ94] (see also Chapter 1 <strong>and</strong> Chapter 5). The partition<strong>in</strong>g of<br />
data <strong>in</strong>to Yβ <strong>and</strong> Yα is sometimes referred to as “past” <strong>and</strong> “future”.<br />
Def<strong>in</strong>e Uα <strong>and</strong> Uβ as block Hankel matrices of <strong>in</strong>puts similar to (6.5)<br />
<strong>and</strong> (6.6), respectively. Compar<strong>in</strong>g (6.5) <strong>and</strong> (6.4), it is clear that Yα is<br />
given by the relation<br />
where<br />
Yα = ΓαXf + ΦαUα + Nα , (6.7)<br />
Xf = � x(1 + β) x(2 + β) ... x(N + β) � ,<br />
<strong>and</strong> Nα is def<strong>in</strong>ed from nα(t) similar to (6.5). Two methods of estimat<strong>in</strong>g<br />
the range space of Γα from data are presented next. The methods are<br />
based on (6.7).<br />
6.3.1 Basic 4SID<br />
The method described <strong>in</strong> this section is sometimes referred to as the<br />
Basic 4SID method [DVVM88, Liu92]. When consider<strong>in</strong>g this method,<br />
it is not necessary to partition the data <strong>in</strong>to past <strong>and</strong> future. Hence it<br />
can be assumed that β = 0. The idea is to study the residuals <strong>in</strong> the<br />
regression of yα(t) on uα(t) <strong>in</strong> (6.4). Introduce the orthogonal projection<br />
matrix on the null space of Uα as<br />
Π ⊥<br />
U T α = I − UT α(UαU T α) −1 Uα.
6.3 Background 157<br />
If the <strong>in</strong>put u(t) is PE(α), then the <strong>in</strong>verse of UαU T α exists for sufficiently<br />
large N. The residual vectors <strong>in</strong> the regression of yα(t) on uα(t) are now<br />
given by the columns of the follow<strong>in</strong>g expression:<br />
1<br />
√ YαΠ<br />
N ⊥<br />
UT 1<br />
= √ ΓαXfΠ<br />
α N ⊥<br />
UT 1<br />
+ √ NαΠ<br />
α N ⊥<br />
UT , (6.8)<br />
α<br />
where the normaliz<strong>in</strong>g factor 1/ √ N has been <strong>in</strong>troduced for later convenience.<br />
The approach taken <strong>in</strong> Basic 4SID to estimate the range space<br />
of Γα is the follow<strong>in</strong>g: reta<strong>in</strong> the n left pr<strong>in</strong>cipal s<strong>in</strong>gular vectors <strong>in</strong> the<br />
s<strong>in</strong>gular value decomposition (SVD) of (6.8). The Basic 4SID estimate<br />
of Γα is then given by<br />
where ˆ Qs is obta<strong>in</strong>ed from the SVD<br />
� �<br />
ˆQs ˆQn<br />
� ��<br />
Ss<br />
ˆ 0 ˆV T<br />
s<br />
0 Sn<br />
ˆ ˆV T �<br />
n<br />
ˆΓα = ˆ Qs , (6.9)<br />
= 1<br />
√ N YαΠ ⊥<br />
U T α .<br />
Here, ˆ Ss is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the n largest s<strong>in</strong>gular values <strong>and</strong><br />
ˆQs conta<strong>in</strong>s the correspond<strong>in</strong>g left s<strong>in</strong>gular vectors.<br />
6.3.2 IV-4SID<br />
The second type of estimator that will be analyzed is called IV-4SID<br />
[VD94b, Lar90, Ver94]. The reason for this name is that this approach<br />
is most easily expla<strong>in</strong>ed <strong>in</strong> terms of <strong>in</strong>strumental variables (IVs) (see e.g.<br />
[Vib95, Gus97]). The advantage of IV-4SID (compared to Basic 4SID)<br />
is that IV-4SID can h<strong>and</strong>le more general noise correlations. The ma<strong>in</strong><br />
idea here is to use an <strong>in</strong>strumental variable, which not only removes the<br />
effect of uα(t) <strong>in</strong> (6.4) (as <strong>in</strong> the Basic 4SID), but also reduces the effect<br />
of nα(t). The first step is to remove the future <strong>in</strong>puts. This is the same<br />
as <strong>in</strong> Basic 4SID. The second step is to remove the noise term <strong>in</strong> (6.8).<br />
This is accomplished by multiply<strong>in</strong>g from the right by the “<strong>in</strong>strumental<br />
variable matrix”<br />
Pβ =<br />
� Yβ<br />
Uβ<br />
�<br />
. (6.10)<br />
Post-multiply<strong>in</strong>g both sides of (6.8) with the matrix <strong>in</strong> (6.10), <strong>and</strong> normaliz<strong>in</strong>g<br />
by 1/ √ N yields,<br />
1 ⊥<br />
YαΠU N T α PTβ = 1<br />
⊥<br />
ΓαXfΠU N T α PTβ + 1 ⊥<br />
NαΠU N T α PT β ,
158 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
� 1<br />
N<br />
� 1<br />
N<br />
ˆWc<br />
⊥<br />
PβΠUT α PT β<br />
⊥<br />
PβΠUT α PT �−1 �<br />
1<br />
β<br />
�<br />
1<br />
N PβPT β<br />
�<br />
1 ⊥<br />
N PβΠUT α PT β<br />
� −1/2<br />
N PβPT β<br />
�−1/2 � −1/2<br />
� 1/2<br />
ˆWr<br />
Method<br />
I PO-MOESP<br />
I N4SID<br />
�<br />
1<br />
N YαΠ ⊥<br />
UT α YT �−1/2 α<br />
�<br />
1 ⊥<br />
N YαΠUT α YT �−1/2 α<br />
IVM<br />
CVA<br />
Table 6.1: Weight<strong>in</strong>g matrices correspond<strong>in</strong>g to specific methods. Notice<br />
that some of the weight<strong>in</strong>gs may not be as they appear <strong>in</strong> the referred<br />
papers. These weight<strong>in</strong>gs, however, give estimates of Γα identical to those<br />
obta<strong>in</strong>ed us<strong>in</strong>g the orig<strong>in</strong>al choice of weight<strong>in</strong>g.<br />
where, due to the assumptions, the second term on the right tends to<br />
zero with probability one (w.p.1) as N tends to <strong>in</strong>f<strong>in</strong>ity (s<strong>in</strong>ce the <strong>in</strong>put<br />
<strong>and</strong> the noise process are uncorrelated, <strong>and</strong> s<strong>in</strong>ce there is a “time shift”<br />
between Nα <strong>and</strong> the noise <strong>in</strong> Yβ). A quite general estimate of Γα is then<br />
given by<br />
ˆΓα = ˆ W −1<br />
r ˆ Qs , (6.11)<br />
where ˆ Qs is the αl × n matrix made of the n left s<strong>in</strong>gular vectors of<br />
1<br />
N ˆ WrYαΠ ⊥<br />
U T α PT β ˆ Wc<br />
(6.12)<br />
correspond<strong>in</strong>g to the n largest s<strong>in</strong>gular values. The (data dependent)<br />
weight<strong>in</strong>g matrices ˆ Wr <strong>and</strong> ˆ Wc can be chosen <strong>in</strong> different ways to yield<br />
a class of estimators. This class <strong>in</strong>cludes most of the known methods appear<strong>in</strong>g<br />
<strong>in</strong> the literature; see Table 6.1, cf. [VD95c, Vib95]. The methods<br />
PO-MOESP, N4SID, IVM <strong>and</strong> CVA appear <strong>in</strong> [Ver94, VD94b, Vib95,<br />
Lar90], respectively. For some results regard<strong>in</strong>g the effect of the choice<br />
of the weight<strong>in</strong>g matrices, see [JW95, JW96a, VD95b].<br />
6.4 Analysis of Basic 4SID<br />
In this section, conditions are established by which the Basic 4SID estimate<br />
ˆ Γα <strong>in</strong> (6.9) becomes consistent. Here, consistency means that the
6.4 Analysis of Basic 4SID 159<br />
column span of ˆ Γα tends to the column span of Γα as N tends to <strong>in</strong>f<strong>in</strong>ity<br />
(w.p.1). This method has been analyzed previously (see [Gop69,<br />
VOWL91, Liu92, VD92]). This problem is reconsidered here, however,<br />
for two reasons. First, the proof to be presented appears to be more<br />
explicit than previous ones (known to the authors). Second, these results<br />
are useful <strong>in</strong> the analysis of the more <strong>in</strong>terest<strong>in</strong>g IV-based methods given<br />
<strong>in</strong> the next section.<br />
Recall Equation (6.4). No correlation exists between the noise nα(t)<br />
<strong>and</strong> uα(t) or x(t). It is therefore relatively straightforward to show that<br />
the sample covariance matrix of the residuals <strong>in</strong> (6.8)<br />
tends to<br />
ˆRε = 1 ⊥<br />
YαΠU N T α YT α<br />
Rε = Γα(rx − rxuR −1<br />
u r T xu)Γ T α + Rn<br />
(6.13)<br />
with probability one, as N → ∞. Here, the various correlation matrices<br />
are def<strong>in</strong>ed as:<br />
rx = Ē{x(t)xT (t)},<br />
rxu = Ē{x(t)uT α(t)},<br />
Ru = Ē{uα(t)u T α(t)},<br />
Rn = Ē{nα(t)n T α(t)}.<br />
Recall that u(t) is assumed to be PE(α). This implies that Ru is positive<br />
def<strong>in</strong>ite. From (6.13) it can be concluded that the estimate of Γα <strong>in</strong> (6.9)<br />
is consistent if <strong>and</strong> only if, 1<br />
Rn = σ 2 I, (6.14)<br />
rx − rxuR −1<br />
u r T xu > 0 (6.15)<br />
for some scalar σ 2 . If (6.14) <strong>and</strong> (6.15) hold, then the limit of ˆ Γα (as<br />
def<strong>in</strong>ed <strong>in</strong> (6.9)), is given by ΓαT for some non-s<strong>in</strong>gular n × n matrix<br />
T. The condition (6.14) is essentially only true for output error systems<br />
perturbed by a measurement noise of equal power <strong>in</strong> each output channel.<br />
1 There may exist particular cases when Basic 4SID is consistent even when Rn �=<br />
σ 2 I. However, these cases depend on the system parameters <strong>and</strong> are not of <strong>in</strong>terest<br />
here.
160 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
That is, if K = 0 <strong>and</strong> re = σ 2 I. In the follow<strong>in</strong>g it is shown under what<br />
conditions (on the <strong>in</strong>put signals) the condition (6.15) is fulfilled.<br />
In view of Lemma 6.2, the condition (6.15) is equivalent to<br />
Ē<br />
�� � � � �<br />
T<br />
x(t) x(t)<br />
=<br />
uα(t) uα(t)<br />
� rx rxu<br />
r T xu Ru<br />
�<br />
> 0. (6.16)<br />
Next, the state is decomposed <strong>in</strong>to two terms x(t) = x d (t)+x s (t), where<br />
the “determ<strong>in</strong>istic” term x d (t) is due to the observed <strong>in</strong>puts <strong>and</strong> the<br />
“stochastic” term x s (t) is driven by the process noise w(t). Due to assumption<br />
A2, no correlation exists between x s (t) <strong>and</strong> x d (t) or uα(t).<br />
Furthermore, <strong>in</strong>troduce a transfer function (operator) description of x d (t)<br />
as<br />
x d (t) = Fu (q −1 )<br />
a(q −1 ) u(t),<br />
where q −1 is the backward shift operator. That is, q −1 u(t) = u(t − 1).<br />
The polynomials F u (q −1 ) <strong>and</strong> a(q −1 ) are def<strong>in</strong>ed by the relations<br />
F u (q −1 ) =<br />
a(q −1 ) =<br />
Adj{qI − A}B<br />
q n<br />
= F u 1q −1 + F u 2q −2 + · · · + F u nq −n , (6.17)<br />
det{qI − A}<br />
q n<br />
= a0 + a1q −1 + · · · + anq −n , (a0 = 1), (6.18)<br />
where Adj{A} denotes the adjugate matrix of A <strong>and</strong> det{·} is the determ<strong>in</strong>ant<br />
function. The polynomial coefficients ai (i = 0,...,n) are scalars<br />
while all F u i (i = 1,...,n) are n × m matrices. Analogously to Fu (q −1 ),<br />
def<strong>in</strong>e the matrix polynomial F w (q −1 ), with K replac<strong>in</strong>g B:<br />
where F w i<br />
F w (q −1 ) = F w 1 q −1 + F w 2 q −2 + · · · + F w nq −n ,<br />
(i = 1,...,n) are n×p matrices <strong>and</strong> K is def<strong>in</strong>ed <strong>in</strong> assumption<br />
A1. Thus, xs (t) is given by the relation<br />
x s (t) = Fw (q −1 )<br />
a(q −1 ) v(t).
6.4 Analysis of Basic 4SID 161<br />
Also def<strong>in</strong>e the Sylvester-like matrix<br />
⎡<br />
F u n ... F u 1 0 ... 0 F w n ... F w 1<br />
⎢anIm<br />
⎢<br />
S = ⎢<br />
⎣<br />
...<br />
. ..<br />
a1Im a0Im<br />
. ..<br />
0 0<br />
.<br />
... 0 ⎥<br />
.<br />
⎥ . (6.19)<br />
. ⎦<br />
�<br />
0 anIm ... ...<br />
��<br />
a0Im 0 ... 0<br />
�<br />
(n+αm)×((α+n)m+np)<br />
Us<strong>in</strong>g the notation <strong>in</strong>troduced above, it can be verified that<br />
� �<br />
x(t) 1<br />
= S<br />
uα(t) a(q−1 ⎡ ⎤<br />
u(t − n)<br />
⎢ .<br />
⎢ .<br />
⎥<br />
⎢ . ⎥<br />
⎢<br />
⎢u(t<br />
+ α − 1) ⎥<br />
) ⎢ v(t − n) ⎥ .<br />
⎥<br />
⎢ . ⎥<br />
⎣ . ⎦<br />
v(t − 1)<br />
It is shown <strong>in</strong> Appendix A.14 that S has full row rank under assumption<br />
A4. This implies that the condition (6.16) is satisfied if the follow<strong>in</strong>g<br />
matrix is positive def<strong>in</strong>ite:<br />
⎧<br />
⎪⎨<br />
1<br />
Ē<br />
a(q<br />
⎪⎩<br />
−1 ⎡ ⎤<br />
u(t − n)<br />
⎢ .<br />
⎢ .<br />
⎥<br />
⎢ . ⎥<br />
⎢<br />
⎢u(t<br />
+ α − 1) ⎥<br />
1<br />
) ⎢ v(t − n) ⎥ a(q<br />
⎢ . ⎥<br />
⎣ . ⎦<br />
v(t − 1)<br />
−1 ⎡ ⎤T<br />
u(t − n)<br />
⎢ .<br />
⎢ .<br />
⎥<br />
⎢ . ⎥<br />
⎢<br />
⎢u(t<br />
+ α − 1) ⎥<br />
) ⎢ v(t − n) ⎥<br />
⎢ . ⎥<br />
⎣ . ⎦<br />
v(t − 1)<br />
⎫⎪ ⎬<br />
. (6.20)<br />
⎪⎭<br />
S<strong>in</strong>ce u(t) <strong>and</strong> v(t) are uncorrelated, the covariance matrix <strong>in</strong> (6.20) will<br />
be block diagonal. The lower diagonal block is positive def<strong>in</strong>ite s<strong>in</strong>ce<br />
E{v(t)v T (τ)} = Iδt,τ. The upper diagonal block can be shown to be<br />
positive def<strong>in</strong>ite if u(t) is PE(α+n); see Lemma A.3.7 <strong>in</strong> [SS83], modified<br />
to quasi-stationary signals. In conclusion, the follow<strong>in</strong>g theorem has been<br />
proven.<br />
Theorem 6.1. Under the general assumptions A1-A4, the Basic 4SID<br />
subspace estimate (6.9) is consistent if the follow<strong>in</strong>g conditions are true:<br />
1. The covariance matrix of the stacked noise is proportional to the<br />
identity matrix. That is, Rn = σ 2 I.<br />
⎤
162 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
2. The <strong>in</strong>put signal u(t) is persistently excit<strong>in</strong>g of order α + n.<br />
Remark 6.1. As previously mentioned, condition (1) of the theorem is<br />
essentially only true if K = 0 <strong>and</strong> if re = σ 2 I. Furthermore, K = 0<br />
implies that the system must be reachable from u(t).<br />
6.5 Analysis of IV-4SID<br />
In this section, conditions for the IV-4SID estimate ˆ Γα <strong>in</strong> (6.11) to be<br />
consistent are established. As N → ∞, the limit of the quantity <strong>in</strong> (6.12)<br />
is w.p.1 given by<br />
WrΓα(rxp − rxuR −1<br />
u Rup)Wc, (6.21)<br />
where Wr <strong>and</strong> Wc are the limits of ˆ Wr <strong>and</strong> ˆ Wc, respectively. The<br />
correlation matrices appear<strong>in</strong>g <strong>in</strong> (6.21) are def<strong>in</strong>ed as follows:<br />
rxp = Ē{x(t + β)pT β (t)},<br />
rxu = Ē{x(t)uT α(t)},<br />
Ru = Ē{uα(t)u T α(t)},<br />
Rup = Ē{uα(t + β)p T β (t)},<br />
where pβ(t), t = 1, ..., N, are def<strong>in</strong>ed as the columns of Pβ <strong>in</strong> (6.10).<br />
If the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive def<strong>in</strong>ite, the estimate<br />
(6.11) is consistent if <strong>and</strong> only if the follow<strong>in</strong>g matrix has full rank equal<br />
to n:<br />
rxp − rxuR −1<br />
u Rup. (6.22)<br />
This is because the limit<strong>in</strong>g pr<strong>in</strong>cipal left s<strong>in</strong>gular vectors of (6.12) will<br />
be equal to WrΓαT for some non-s<strong>in</strong>gular n × n matrix T. Hence, the<br />
limit of ˆ Γα <strong>in</strong> (6.11) is ΓαT. In view of Lemma 6.1, the condition that<br />
(6.22) should have full rank can be reformulated as<br />
rank Ē<br />
⎧<br />
⎪⎨ � �<br />
x(t + β)<br />
⎪⎩ uα(t + β)<br />
⎡ ⎤T<br />
yβ(t)<br />
⎣ uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎪⎬ = n + αm. (6.23)<br />
⎪⎭<br />
This is the critical relation for consistency of IV-4SID methods.
6.5 Analysis of IV-4SID 163<br />
Method Order of PE Condition on noise<br />
(t)} > 0<br />
N4SID α + β E{nβ(t)nT IVM ξ � max(α,β)<br />
β (t)} > 0<br />
E{nξ(t)nT ξ (t)} > 0<br />
(t)} > 0<br />
PO-MOESP α + β E{nβ(t)n T β<br />
CVA α + β E{nξ(t)n T ξ<br />
Table 6.2: Sufficient conditions for the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc<br />
to be positive def<strong>in</strong>ite.<br />
Let us beg<strong>in</strong> by deriv<strong>in</strong>g conditions which ensure that the weight<strong>in</strong>g<br />
matrices <strong>in</strong> Table 6.1 exist (i.e., are positive def<strong>in</strong>ite). Consider for exam-<br />
ple the limit of the column weight<strong>in</strong>g Wc for the PO-MOESP method.<br />
/N is non-s<strong>in</strong>gular for large N.<br />
We need to establish that PβΠ ⊥<br />
UT α PT β<br />
Us<strong>in</strong>g Lemma 6.2, the condition W−2 c > 0 can be written as<br />
⎧⎡<br />
⎤ ⎡ ⎤T<br />
⎪⎨ yβ(t) yβ(t)<br />
Ē ⎣ uβ(t) ⎦ ⎣ uβ(t) ⎦<br />
⎪⎩ uα(t + β) uα(t + β)<br />
⎫ ⎪⎬ > 0. (6.24)<br />
⎪⎭<br />
This is recognized as a similar requirement appear<strong>in</strong>g <strong>in</strong> least-squares l<strong>in</strong>ear<br />
regression estimation [SS89, Complement C5.1]. The condition (6.24)<br />
holds if u(t) is PE(α+β) <strong>and</strong> if E{nβ(t)nT β (t)} > 0. Similarly, conditions<br />
for all cases <strong>in</strong> Table 6.1 can be obta<strong>in</strong>ed. The result is summarized <strong>in</strong><br />
Table 6.2.<br />
Hav<strong>in</strong>g established that the weight<strong>in</strong>g matrices are positive def<strong>in</strong>ite,<br />
condition (6.23) may now be exam<strong>in</strong>ed. The correlation matrix <strong>in</strong> (6.23)<br />
can be written as the sum of two terms:<br />
⎧<br />
⎪⎨ � �<br />
d x (t + β)<br />
Ē<br />
⎪⎩ uα(t + β)<br />
⎡<br />
y<br />
⎣<br />
d β (t)<br />
⎤T<br />
uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎧<br />
⎪⎬ ⎪⎨ � �<br />
s x (t + β)<br />
+ Ē<br />
⎪⎭ ⎪⎩ 0<br />
⎡<br />
y<br />
⎣<br />
s β (t)<br />
⎤T<br />
0 ⎦<br />
0<br />
⎫ ⎪⎬ ⎪⎭ ,<br />
(6.25)<br />
where the superscripts (·) d <strong>and</strong> (·) s denote the determ<strong>in</strong>istic part <strong>and</strong><br />
the stochastic part, respectively. Below we show that the first term <strong>in</strong><br />
(6.25) has full row rank under a PE condition on u(t). This implies that<br />
(6.23) is fulfilled for systems with no process noise (s<strong>in</strong>ce x s (t) ≡ 0 <strong>in</strong><br />
that case). For systems with process noise, one may <strong>in</strong> exceptional cases<br />
lose rank when add<strong>in</strong>g the two terms <strong>in</strong> (6.25). <strong>On</strong>e example of such a
164 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
situation is given <strong>in</strong> the next section. Generally, loss of rank is unlikely<br />
to occur (at least if βl ≫ n). If βl ≫ n then the matrix <strong>in</strong> (6.25) has<br />
many more columns than rows. Hence it is more likely that the matrix<br />
has full rank. This last po<strong>in</strong>t can be made more strict (see Theorem 6.5<br />
<strong>and</strong> [PSD96]).<br />
In the follow<strong>in</strong>g it is shown that the first term <strong>in</strong> (6.25) has full row<br />
rank under certa<strong>in</strong> conditions. It can be verified that the correlation<br />
matrix is equivalent to;<br />
� �<br />
β A Cβ(A,B) 0<br />
0 0 Iαm<br />
× Ē<br />
⎧⎡<br />
⎪⎨<br />
⎣<br />
⎪⎩<br />
xd ⎤ ⎡<br />
(t)<br />
uβ(t) ⎦ ⎣<br />
uα(t + β)<br />
xd ⎤T<br />
(t)<br />
uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎡<br />
⎪⎬ ⎣<br />
⎪⎭<br />
Γ T β 0 0<br />
Φ T β Iβm 0<br />
0 0 Iαm<br />
where Cβ(A,B) denotes the reversed reachability matrix. That is,<br />
⎤<br />
⎦, (6.26)<br />
Cβ(A,B) = � A β−1 B A β−2 B ... B � . (6.27)<br />
Clearly, the first factor has full row rank (n + αm) if the system (6.1)<br />
is reachable from u(t) <strong>and</strong> if β is greater than or equal to the controllability<br />
<strong>in</strong>dex. Notice that the last factor has full row rank, provided β is<br />
greater than or equal to the observability <strong>in</strong>dex. A sufficient condition is<br />
then obta<strong>in</strong>ed if the covariance matrix <strong>in</strong> the middle of (6.26) is positive<br />
def<strong>in</strong>ite. We have<br />
⎧⎡<br />
⎪⎨<br />
Ē ⎣<br />
⎪⎩<br />
xd ⎤ ⎡<br />
(t)<br />
uβ(t) ⎦ ⎣<br />
uα(t + β)<br />
xd ⎤T<br />
(t)<br />
uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎪⎬ ⎪⎭<br />
= S Ē<br />
��<br />
1<br />
a(q−1 ) uα+β+n(t<br />
� �<br />
1<br />
− n)<br />
a(q−1 ) uT ��<br />
α+β+n(t − n) S T ,<br />
where<br />
⎡<br />
F u n ... F u 1 0 ... 0<br />
(6.28)<br />
⎢anIm<br />
⎢<br />
S = ⎢<br />
⎣<br />
...<br />
. ..<br />
a1Im a0Im<br />
. ..<br />
0 ⎥ .<br />
⎦<br />
(6.29)<br />
�<br />
0 anIm ...<br />
��<br />
... a0Im<br />
�<br />
(n+αm+βm)×((α+β+n)m)<br />
⎤
6.5 Analysis of IV-4SID 165<br />
If the system is reachable from u(t) then S is full row rank accord<strong>in</strong>g to<br />
Appendix A.14. The matrix <strong>in</strong> (6.28) then, is positive def<strong>in</strong>ite if u(t) is<br />
PE(α + β + n). The result is summarized <strong>in</strong> the follow<strong>in</strong>g theorem.<br />
Theorem 6.2. The IV-4SID subspace estimate ˆ Γα given by (6.11) is<br />
consistent if the general assumptions A1-A4 apply <strong>in</strong> addition to the<br />
follow<strong>in</strong>g conditions:<br />
1. (A, B) is reachable.<br />
2. β ≥ max(observability <strong>in</strong>dex, controllability <strong>in</strong>dex).<br />
3. rw = 0.<br />
4. re > 0.<br />
5. The <strong>in</strong>put is persistently excit<strong>in</strong>g of order α + β + n.<br />
Remark 6.2. The condition rw = 0 ensures that the second term <strong>in</strong> (6.25)<br />
is zero. Furthermore, the conditions for the weight<strong>in</strong>g matrices <strong>in</strong> Table<br />
6.2 are fulfilled if the last two conditions hold.<br />
We conclude this section by giv<strong>in</strong>g three alternate consistency theorems<br />
for IV-4SID. The first relaxes the order of PE on u(t) for s<strong>in</strong>gle<br />
<strong>in</strong>put systems, while the second strengthens the conditions on u(t) <strong>in</strong> order<br />
to prove consistency for systems with process noise. F<strong>in</strong>ally, the third<br />
theorem considers s<strong>in</strong>gle <strong>in</strong>put ARMA (autoregressive mov<strong>in</strong>g average)<br />
signals.<br />
Theorem 6.3. For s<strong>in</strong>gle <strong>in</strong>put systems, the IV-4SID subspace estimate<br />
ˆΓα given by (6.11) is consistent if the general assumptions A1-A4 apply<br />
<strong>in</strong> addition to the follow<strong>in</strong>g conditions:<br />
1. (A, B) is reachable.<br />
2. β ≥ the observability <strong>in</strong>dex.<br />
3. rw = 0.<br />
4. re > 0.<br />
5. The <strong>in</strong>put is persistently excit<strong>in</strong>g of order α + n.
166 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
Proof. Consider a s<strong>in</strong>gle <strong>in</strong>put system <strong>and</strong> def<strong>in</strong>e the vector polynomial<br />
<strong>in</strong> q −1 :<br />
where F y<br />
i<br />
F y (q −1 ) =<br />
CAdj{qI − A}B<br />
qn + D<br />
= F y<br />
0 + Fy1<br />
q−1 + F y<br />
2q−2 + · · · + F y nq −n ,<br />
(i = 0, ..., n) are l × 1 vectors. This implies that<br />
y d (t) =<br />
�n k=0 Fy<br />
kq−k �n k=0<br />
akq −k u(t), (6.30)<br />
where {ai} is def<strong>in</strong>ed <strong>in</strong> (6.18). Next, def<strong>in</strong>e the Sylvester matrix<br />
⎡<br />
F<br />
⎢<br />
Sy = ⎢<br />
⎣<br />
y n ... F y<br />
0 0 0 ... 0<br />
. .. . ..<br />
.<br />
.<br />
.<br />
.<br />
0 Fy n ... F y<br />
⎤ ⎫<br />
⎪⎬<br />
⎥ βl rows<br />
⎥ ⎪⎭<br />
0 0 ... 0 ⎥ ⎫<br />
an ... a0 0 0 ... 0 ⎥ ⎪⎬<br />
. .. . ..<br />
. . ⎥<br />
. . ⎥ β rows . (6.31)<br />
⎥ ⎪⎭<br />
0 an ... a0 0 ... 0 ⎥ ⎫<br />
0 ... 0 an ... a0 0 ⎥ ⎪⎬<br />
.<br />
. .<br />
.<br />
.. . ⎥<br />
.. ⎦ α rows<br />
⎪⎭<br />
0 ... 0 0 an ... a0<br />
� �� �<br />
(α+β+βl)×(n+α+β)<br />
It can be shown that the rank of Sy is n + α + β if the system is observable<br />
<strong>and</strong> reachable from u(t), <strong>and</strong> if β is greater than or equal to the<br />
observability <strong>in</strong>dex [AJ76, BKAK78]. A straightforward (simple) proof<br />
of this fact is given <strong>in</strong> Appendix A.15, us<strong>in</strong>g the notation <strong>in</strong>troduced <strong>in</strong><br />
this chapter. The first term <strong>in</strong> (6.25) can now be written as<br />
⎧<br />
⎪⎨ � �<br />
d x (t + β)<br />
Ē<br />
⎪⎩ uα(t + β)<br />
⎡ ⎤T<br />
⎣ ⎦<br />
⎫ ⎪⎬ ⎪⎭ = SRUS T y , (6.32)<br />
y d β (t)<br />
uβ(t)<br />
uα(t + β)<br />
where<br />
RU = Ē<br />
�<br />
1<br />
a(q−1 ) un+α(t<br />
1<br />
+ β − n)<br />
a(q−1 ) uT �<br />
α+β+n(t − n) .
6.5 Analysis of IV-4SID 167<br />
The covariance matrix RU is full row rank if u is PE(α + n). It is then<br />
clear that (6.32) is full rank s<strong>in</strong>ce S is non-s<strong>in</strong>gular <strong>and</strong> Sy is full column<br />
rank.<br />
Therefore, for s<strong>in</strong>gle <strong>in</strong>put systems we can relax the assumption about<br />
the <strong>in</strong>put signal. The proof of Theorem 6.3 does not carry over to the<br />
multi-<strong>in</strong>put case because Sy is not full column rank if m > 1 (see Appendix<br />
A.15).<br />
Next, we give the follow<strong>in</strong>g consistency result concern<strong>in</strong>g the case<br />
when the <strong>in</strong>put is a white noise process.<br />
Theorem 6.4. Let the <strong>in</strong>put u(t) to the system <strong>in</strong> (6.1) be a zero-mean<br />
white sequence <strong>in</strong> the sense that ru(τ) = ru(0)δτ,0 <strong>and</strong> r(0) > 0. Assume<br />
that A1-A4 hold, the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive<br />
def<strong>in</strong>ite, <strong>and</strong><br />
Here, Cβ(A,B) is def<strong>in</strong>ed <strong>in</strong> (6.27),<br />
rank �� Cβ(A,B) Cβ(A,G) �� = n. (6.33)<br />
Cβ(A,G) = � A β−1 G A β−2 G ... G � ,<br />
<strong>and</strong> the matrix G is def<strong>in</strong>ed as<br />
G = Ē{x(t + 1)yT (t)}.<br />
Given the above assumptions, the IV-4SID subspace estimate ˆ Γα def<strong>in</strong>ed<br />
<strong>in</strong> (6.11) is consistent.<br />
Proof. Consider the correlation matrix <strong>in</strong> (6.23):<br />
⎧<br />
⎪⎨ � �<br />
x(t + β)<br />
Ē<br />
⎪⎩ uα(t + β)<br />
⎡ ⎤T<br />
yβ(t)<br />
⎣ uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎪⎬ ⎪⎭<br />
�<br />
T Ē{x(t + β)yβ (t)} Ē{x(t + β)u<br />
=<br />
T �<br />
β (t)} 0<br />
, (6.34)<br />
0 0 Ruα<br />
where<br />
Ruα = Ē{uα(t)u T α(t)} = Iα ⊗ ru(0).<br />
The symbol ⊗ denotes the Kronecker product. In (6.34), we used the<br />
fact that Ē{x(t)uT (t + k)} = 0 for k ≥ 0 s<strong>in</strong>ce u(t) is a white sequence.
168 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
Notice that Ruα > 0 s<strong>in</strong>ce ru(0) > 0. Under the given assumptions it is<br />
readily proven that<br />
<strong>and</strong><br />
Ē{x(t + β)y T β (t)} = Cβ(A,G),<br />
Ē{x(t + β)u T β (t)} = Cβ(A,B)(Iβ ⊗ ru(0)) .<br />
We conclude that the matrix <strong>in</strong> (6.34) is full row rank under the reachability<br />
assumption given <strong>in</strong> (6.33). This concludes the proof.<br />
Notice that Theorem 6.4 guarantees consistency even for systems with<br />
process noise. Also observe that we here were able to relax the reachability<br />
condition on (A,B) compared to the previous theorems. Some<br />
other results for the case with a white noise <strong>in</strong>put signal can be found <strong>in</strong><br />
[Ver93].<br />
F<strong>in</strong>ally, we give a result for s<strong>in</strong>gle <strong>in</strong>put ARMA signals.<br />
Theorem 6.5. Assume that the scalar (m = 1) <strong>in</strong>put u(t) to the system<br />
<strong>in</strong> (6.1) is generated by the ARMA model<br />
F(q −1 )u(t) = G(q −1 )ε(t),<br />
where ε(t) is a white noise process. Here, F(z) <strong>and</strong> G(z) are relatively<br />
prime polynomials of degree nF <strong>and</strong> nG, respectively, <strong>and</strong> they have all<br />
zeros outside the unit circle. Furthermore, assume that A1-A4 hold, the<br />
weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive def<strong>in</strong>ite, <strong>and</strong><br />
1. (A,B) is reachable.<br />
2. β ≥ n<br />
3. nG ≤ β − n<br />
4. nF ≤ α + β − n<br />
Given the above assumptions, the IV-4SID subspace estimate ˆ Γα <strong>in</strong> (6.11)<br />
is consistent.<br />
Proof. Study the last two block columns of (6.23). Clearly, if<br />
rank Ē<br />
�� � � � �<br />
T<br />
x(t + β) uβ(t)<br />
= n + αm (6.35)<br />
uα(t + β) uα(t + β)
6.6 Counterexample 169<br />
holds, then so does (6.23). Us<strong>in</strong>g the notation <strong>in</strong>troduced <strong>in</strong> (6.29), we<br />
can write the correlation matrix <strong>in</strong> (6.35) as<br />
SĒ �<br />
� 1<br />
a(q−1 ) un+α(t + β − n) � u T �<br />
α+β(t) , (6.36)<br />
where S is (n + α) × (n + α) <strong>and</strong> non-s<strong>in</strong>gular s<strong>in</strong>ce (A,B) is reachable<br />
(see Appendix A.14). Lemma A3.8 <strong>in</strong> [SS83] shows that the correlation<br />
matrix appear<strong>in</strong>g <strong>in</strong> (6.36) is full row rank if<br />
max(α + n + nG,n + nF) ≤ α + β, (6.37)<br />
which can be easily transformed to the conditions of the theorem.<br />
Notice that the theorem holds for arbitrarily colored noise as long as it<br />
is uncorrelated with the <strong>in</strong>put. With some more effort the theorem can<br />
be extended to the multi-<strong>in</strong>put case by modify<strong>in</strong>g Lemma A3.8 <strong>in</strong> [SS83].<br />
For example, the theorem holds if the different <strong>in</strong>puts are <strong>in</strong>dependent<br />
ARMA signals. The result of the theorem is very <strong>in</strong>terest<strong>in</strong>g s<strong>in</strong>ce it gives<br />
a quantification of the statement that “for large enough β, the matrix<br />
<strong>in</strong> (6.25) is full rank”. Also observe that an AR-<strong>in</strong>put of arbitrarily high<br />
order is allowed if α is chosen large enough.<br />
6.6 Counterexample<br />
The purpose of this section is to present an example of a system for which<br />
the IV-4SID methods are not consistent. We have shown that a necessary<br />
condition for consistency is the validity of (6.23). In the discussion<br />
follow<strong>in</strong>g (6.25), it was also po<strong>in</strong>ted out that consistency problems may<br />
occur for systems with process noise. In this section, we give an explicit<br />
example for which the rank drops when the two terms <strong>in</strong> (6.25) are added.<br />
Consider the cross correlation matrix appear<strong>in</strong>g <strong>in</strong> (6.23) which with<br />
obvious notation, reads<br />
� Rx d y d β + Rx s y s β R x d uβ R x d uα<br />
R uαy d β<br />
Ruαuβ Ruαuα<br />
The construction of a system for which this matrix loses rank is divided<br />
<strong>in</strong>to two major steps:<br />
�<br />
.
170 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
1. F<strong>in</strong>d a vector v such that<br />
2. Design Rx s y s β<br />
v T<br />
such that<br />
v T<br />
� Rx d uβ R x d uα<br />
Ruαuβ Ruαuα<br />
� Rx d y d β + Rx s y s β<br />
R uαy d β<br />
�<br />
= 0.<br />
�<br />
= 0. (6.38)<br />
The first step turns out to be similar to a construction used <strong>in</strong> the analysis<br />
of <strong>in</strong>strumental variable methods (see [SS81, SS83] <strong>and</strong> the references<br />
there<strong>in</strong>). <strong>On</strong>ce a solution to the first step is found, the second step is<br />
a matter of <strong>in</strong>vestigat<strong>in</strong>g whether or not a stochastic subsystem exists,<br />
generat<strong>in</strong>g a valid Rx s y s β .<br />
Let us study the first step <strong>in</strong> more detail. For notational convenience,<br />
def<strong>in</strong>e the covariance matrix<br />
Pη×ν(t1,t2) = Ē<br />
�<br />
� 1<br />
a(q−1 ) uη(t − t1) � u T �<br />
ν (t − t2) .<br />
The correlation matrix of <strong>in</strong>terest <strong>in</strong> the first step can be written as<br />
�<br />
Rxduβ Rxd �<br />
uα = SP (α+n)×(α+β)(n − β,0),<br />
Ruαuβ Ruαuα<br />
where S is def<strong>in</strong>ed <strong>in</strong> (6.29). Here, however, it is of dimension (n+αm)×<br />
((α+n)m). The rank properties of Pη×ν(0,0) have been studied <strong>in</strong> [SS83].<br />
P is full rank either if a is positive real <strong>and</strong> u(t) is persistently excit<strong>in</strong>g,<br />
or if u(t) is an ARMA-process of sufficiently low order (e.g., white noise,<br />
cf. Theorem 6.5). Here, we consider an a(q −1 ) that is not positive real to<br />
make P rank deficient. This implies that n ≥ 2. Let us fix the follow<strong>in</strong>g<br />
parameters: n = 2, m = 1, l = 1, β = 2 <strong>and</strong> α = 3. If (A,B) is<br />
reachable then S is non-s<strong>in</strong>gular (see Appendix A.14). The first step<br />
can then be solved if there exists a vector g such that g T P5×5(0,0) = 0.<br />
A solution to this can be found by us<strong>in</strong>g the same setup as that <strong>in</strong> the<br />
counterexample ([SS83]) for an <strong>in</strong>strumental variable method. Choose a<br />
pole polynomial <strong>and</strong> an <strong>in</strong>put process accord<strong>in</strong>g to:<br />
a(q −1 ) = (1 − γq −1 ) 2 , (6.39)<br />
u(t) = (1 − γq −1 ) 2 (1 + γq −1 ) 2 ε(t), (6.40)
6.6 Counterexample 171<br />
where ε(t) is a white noise process of unit variance. Notice that γ should<br />
have a modulus less than one for stability <strong>and</strong> that the <strong>in</strong>put is persistently<br />
excit<strong>in</strong>g to any order. Now, a straightforward calculation leads<br />
to<br />
⎡<br />
⎢<br />
P5×5(0,0) = ⎢<br />
⎣<br />
1 − 2γ 4 −4γ 3 γ 6 − 2γ 2 2γ 5 γ 4<br />
2γ 1 − 2γ 4 −4γ 3 γ 6 − 2γ 2 2γ 5<br />
γ 2 2γ 1 − 2γ 4 −4γ 3 γ 6 − 2γ 2<br />
0 γ 2 2γ 1 − 2γ 4 −4γ 3<br />
0 0 γ 2 2γ 1 − 2γ 4<br />
This matrix is s<strong>in</strong>gular if γ is a solution to<br />
det(P5×5(0,0)) = 1 + 4γ 4 + 10γ 8 − 8γ 12 − 15γ 16 − 12γ 20 = 0. (6.41)<br />
The polynomial has two real roots for γ ≈ ±0.91837. Choose, for example,<br />
the positive root for γ <strong>and</strong> g as the left eigenvector correspond<strong>in</strong>g<br />
to the zero eigenvalue of P. This provides a solution to the first step<br />
(namely, v T = g T S −1 ). The solution depends on the matrix B. Until<br />
now, we have fixed the poles of the system <strong>and</strong> a specific <strong>in</strong>put process.<br />
Remark 6.3. Observe that a subspace method that only uses the <strong>in</strong>puts<br />
Uβ as <strong>in</strong>strumental variables (cf. (6.10)) is not consistent for systems<br />
satisfy<strong>in</strong>g the first step. In other words, the past <strong>in</strong>put MOESP method<br />
(see [Ver94]) is not consistent if the poles <strong>and</strong> the <strong>in</strong>put process are related<br />
as <strong>in</strong> (6.39)-(6.40) <strong>and</strong> if γ is a solution to (6.41). It is readily checked<br />
that the conditions of Theorem 6.5 is violated.<br />
The free parameters <strong>in</strong> the second step are B, C, D, <strong>and</strong> those<br />
specify<strong>in</strong>g the stochastic subsystem, with constra<strong>in</strong>ts on m<strong>in</strong>imality of<br />
the system description. Somewhat arbitrarily, 2 we make the choices:<br />
B = [1 −2] T , C = [2 −1] <strong>and</strong> D = 0. Consider the follow<strong>in</strong>g description<br />
of the system <strong>in</strong> state space form:<br />
� �<br />
2 2γ −γ<br />
x(t + 1) = x(t) +<br />
1 0<br />
� �<br />
1<br />
u(t) +<br />
−2<br />
� k1<br />
k2<br />
⎤<br />
⎥<br />
⎦ .<br />
�<br />
e(t), (6.42a)<br />
y(t) = � 2 −1 � x(t) + e(t), (6.42b)<br />
where the variance of the noise process e(t) is re. With the choices made,<br />
the system is observable <strong>and</strong> reachable from u(t). For a given γ, (6.38)<br />
2 For some choices of B, C <strong>and</strong> D we were not able to f<strong>in</strong>d a solution to the second<br />
step. For other choices of C, mak<strong>in</strong>g the system “less observable”, we found solutions<br />
with lower noise variance.
172 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
100<br />
80<br />
60<br />
η1 40<br />
20<br />
0.9183<br />
0.91835<br />
0.9184<br />
0.91845<br />
0.9185 0.79 0.8<br />
0<br />
γ<br />
0.81<br />
0.82<br />
krel<br />
0.83<br />
0.84<br />
Figure 6.1: The graph shows η 1 as a function of γ <strong>and</strong> krel.<br />
now consists of two non-l<strong>in</strong>ear equations <strong>in</strong> k1, k2 <strong>and</strong> re. The equations<br />
are, however, l<strong>in</strong>ear <strong>in</strong> k1re, k2re, k 2 1re, k 2 2re <strong>and</strong> k1k2re. If we change<br />
variables accord<strong>in</strong>g to:<br />
krel = k1<br />
k2<br />
,<br />
η 1 = k 2 2re,<br />
η 2 = k2re,<br />
then the two non-l<strong>in</strong>ear equations become l<strong>in</strong>ear <strong>in</strong> η 1 <strong>and</strong> η 2. Thus,<br />
given γ <strong>and</strong> krel, we can easily solve for η 1 <strong>and</strong> η 2. Observe that η 1 must<br />
be positive to be a valid solution. A plot of η 1 as a function of γ <strong>and</strong> krel<br />
is shown <strong>in</strong> Figure 6.1. The solution η 1 is positive when krel is between<br />
≈ 0.7955 <strong>and</strong> (4γ 3 −3γ 2 +1)/(2γ 2 −2γ+2) ≈ 0.8475 (η 1 is discont<strong>in</strong>uous<br />
at this po<strong>in</strong>t). Let us, for example, choose the solution correspond<strong>in</strong>g to<br />
krel = 0.825. The parameters become: γ ≈ 0.9184, re ≈ 217.1, k1 ≈<br />
−0.1542 <strong>and</strong> k2 ≈ −0.1869. Thus, feed<strong>in</strong>g the system (6.42) with the<br />
<strong>in</strong>put (6.40) <strong>and</strong> us<strong>in</strong>g α = 3 <strong>and</strong> β = 2 will make the matrix <strong>in</strong> (6.23)
6.7 Numerical Examples 173<br />
rank deficient.<br />
We end this section with some short comments about the counterexample.<br />
Indeed, the example presented <strong>in</strong> this section is very special.<br />
However, the existence of these k<strong>in</strong>d of examples <strong>in</strong>dicates that the subspace<br />
methods may have a very poor performance <strong>in</strong> certa<strong>in</strong> situations.<br />
As should be clear from the derivations above, the consistency of 4SID<br />
methods are related to the consistency of some <strong>in</strong>strumental variable<br />
schemes (see [SS81, SS83] <strong>and</strong> the references there<strong>in</strong>). When us<strong>in</strong>g 4SID<br />
methods, it is a good idea to estimate a set of models for different choices<br />
of α <strong>and</strong> β. These models should then be validated separately to see<br />
which gives the best result. This reduces the performance degradation<br />
due to a bad choice of α <strong>and</strong> β. In the counterexample presented above,<br />
such a procedure would probably have led to another choice of the dimension<br />
parameters (given that the system order is known). While this may<br />
lead to consistent estimates, the accuracy may still be poor (as <strong>in</strong>dicated<br />
<strong>in</strong> the next section).<br />
6.7 Numerical Examples<br />
In this section, the previously obta<strong>in</strong>ed results are illustrated by means<br />
of numerical examples. First, the counterexample given <strong>in</strong> the previous<br />
section is simulated. Thus, identification data are generated accord<strong>in</strong>g<br />
to (6.40) <strong>and</strong> (6.42). The extended observability matrix is estimated by<br />
(6.11) us<strong>in</strong>g the weight<strong>in</strong>g matrices for the CVA method (see Table 6.1).<br />
Similar results are obta<strong>in</strong>ed with other choices of the weight<strong>in</strong>g matrices.<br />
The system matrix is then estimated as<br />
 = ˆ Γ †<br />
1 ˆ Γ2,<br />
where (·) † denotes the Moore-Penrose pseudo <strong>in</strong>verse, ˆ Γ1 denotes the<br />
matrix made of the first α − 1 rows <strong>and</strong> ˆ Γ2 conta<strong>in</strong>s the last α − 1 rows<br />
of ˆ Γα. To measure the estimation accuracy we consider the estimated<br />
system poles (i.e., the eigenvalues of Â). Recall that the system (6.42)<br />
has two poles at γ = 0.9184. Table 6.3 shows the absolute value of<br />
the bias <strong>and</strong> the st<strong>and</strong>ard deviation of the pole estimates. The sample<br />
statistics are based on 1000 <strong>in</strong>dependent trials. Each trial consists of 1000<br />
<strong>in</strong>put-output data samples. The results for two different choices of α <strong>and</strong><br />
β are given <strong>in</strong> the table. The first corresponds to the choice made <strong>in</strong> the<br />
counterexample. It is clearly seen that the estimates are not useful. For
174 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />
α = 3, β = 2 α = 6, β = 5<br />
pole 1 5.9572 (69.8826) 0.4704 (0.4806)<br />
pole 2 2.6049 (30.9250) 0.1770 (0.3526)<br />
Table 6.3: The figures shown <strong>in</strong> the table are the absolute values of the<br />
bias <strong>and</strong>, with<strong>in</strong> brackets, the st<strong>and</strong>ard deviations.<br />
α = 3, β = 2 α = 6, β = 5<br />
pole 1 0.1161 (0.2405) 0.0316 (0.0452)<br />
pole 2 0.2266 (0.5043) 0.0282 (0.0290)<br />
Table 6.4: Same as Table 6.3 but for the system with K =<br />
[−0.2100, −0.5590] T .<br />
the second choice (α = 6, β = 5), it can be shown that the algorithm<br />
is consistent. This is <strong>in</strong>dicated by the much more reasonable results <strong>in</strong><br />
Table 6.3. The accuracy, however, is still very poor s<strong>in</strong>ce the matrix <strong>in</strong><br />
(6.23) is nearly rank deficient.<br />
Next, we study the <strong>in</strong>fluence of the stochastic sub-system on the estimation<br />
accuracy. Recall that <strong>in</strong> Section 6.6 the stochastics of the system<br />
were designed to cause a cancellation <strong>in</strong> (6.25). The zeros of the stochastic<br />
sub-system are given by the eigenvalues of A − KC which become,<br />
approximately: 0.9791 ±i0.1045. In the next simulation, the locations of<br />
these zeros are changed such that they have the same distance to the unit<br />
circle, but the angle relative to the real axis is <strong>in</strong>creased five times. This<br />
is accomplished by chang<strong>in</strong>g K <strong>in</strong> (6.42) to [−0.2100, −0.5590] T . The<br />
zeros become 0.8489 ± i0.4990. The simulation results are presented <strong>in</strong><br />
Table 6.4. It can be seen that the estimation accuracy is more reasonable<br />
<strong>in</strong> this case.<br />
F<strong>in</strong>ally, we change the <strong>in</strong>put to be realizations of a zero-mean white<br />
noise process. The power of the white noise <strong>in</strong>put is chosen to be equal<br />
to the power of the colored process (6.40). Except for the <strong>in</strong>put, the<br />
parameters are the same as <strong>in</strong> the counterexample. Recall Theorem 6.4,<br />
which shows that the estimates are consistent <strong>in</strong> this case. Table 6.5<br />
presents the result of the simulations. A comparison with the results of<br />
Table 6.3 clearly shows a great difference.
6.8 Conclusions 175<br />
α = 3, β = 2 α = 6, β = 5<br />
pole 1 0.0464 (0.1146) 0.0184 (0.0271)<br />
pole 2 0.0572 (0.0523) 0.0182 (0.0171)<br />
Table 6.5: Same as Table 6.3 but with a white noise <strong>in</strong>put signal.<br />
6.8 Conclusions<br />
Persistence of excitation conditions on the <strong>in</strong>put signal have been given.<br />
These conditions ensure consistency of the subspace estimate used <strong>in</strong><br />
subspace system identification methods. For these conditions to hold, we<br />
had to assume that the system does not conta<strong>in</strong> any process noise. With<br />
process noise, a persistence of excitation condition alone is not sufficient.<br />
In fact, an example was given of a system for which subspace methods are<br />
not consistent. The system is reachable from the <strong>in</strong>put <strong>and</strong> the <strong>in</strong>put is<br />
persistently excit<strong>in</strong>g to any order. The subspace methods fail, however,<br />
to provide a consistent model of the system transfer function. It was also<br />
shown that a white noise <strong>in</strong>put (or an ARMA <strong>in</strong>put signal of sufficiently<br />
low order) guarantees consistency under weak conditions. This shows<br />
that the performance of subspace methods may be very sensitive to the<br />
<strong>in</strong>put excitation.
Appendix A<br />
Proofs <strong>and</strong> Derivations<br />
A.1 Proofs of (2.34) <strong>and</strong> (2.35)<br />
By R8,<br />
where (see (2.8) <strong>and</strong> (2.13))<br />
B ∗ ˆ R = 1<br />
2N B∗<br />
(λk − σ 2 )B ∗ êk ≃ B ∗ ˆ Rek, (A.1)<br />
N� � � ∗ c T<br />
n(t)x (t) + Jn (t)x (t)J . (A.2)<br />
t=1
178 A Proofs <strong>and</strong> Derivations<br />
Us<strong>in</strong>g (A.2) <strong>and</strong> the st<strong>and</strong>ard formula for the fourth-order moment of<br />
circular Gaussian r<strong>and</strong>om variables, we obta<strong>in</strong><br />
�<br />
E [B ∗ Rei][B ˆ ∗ Rej] ˆ ∗�<br />
= 1<br />
4N 2B∗<br />
N�<br />
t=1 s=1<br />
N�<br />
E � n(t)x ∗ (t)eie ∗ jx(s)n ∗ (s)<br />
+n(t)x ∗ (t)eie ∗ jJx c (s)n T (s)J + Jn c (t)x T (t)Jeie ∗ jx(s)n ∗ (s)<br />
+Jn c (t)x T (t)Jeie ∗ jJx c (s)n T (s)J � B<br />
= 1<br />
4N 2B∗ � N 2 σ 4 eie ∗ j + Nσ 2 (e ∗ jR0ei)I + σ 4 N 2 eie ∗ j + σ 4 NJe c je T i J<br />
+σ 4 N 2 eie ∗ j + σ 4 NJe c je T i J + N 2 σ 4 eie ∗ j + σ 2 N(e ∗ jJR c 0Jei)I � B<br />
= σ2<br />
2N (e∗ jRei)B ∗ B = σ2<br />
2N λiδi,j(B ∗ B),<br />
(A.3)<br />
where to derive the fourth equality use was made of the fact that B∗ei = 0<br />
(see (2.13) <strong>and</strong> (2.22)) <strong>and</strong> that B∗Jec i = 0 (see R3). The expression for<br />
E[zz∗ ] <strong>in</strong> (2.34) follows from (A.1) <strong>and</strong> (A.3).<br />
Next, consider (2.35). A straightforward calculation gives<br />
�<br />
E [B ∗ Rei][B ˆ ∗ Rej] ˆ T�<br />
= 1<br />
4N 2B∗<br />
N�<br />
t=1 s=1<br />
N�<br />
E � n(t)x ∗ (t)eie T j x c (s)n T (s)<br />
+n(t)x ∗ (t)eie T j Jx(s)n ∗ (s)J + Jn c (t)x T (t)Jeie T j x c (s)n T (s)<br />
+Jn c (t)x T (t)Jeie T j Jx(s)n ∗ (s)J � B c<br />
= 1<br />
4N2B∗ � N 2 σ 4 eie T j + σ 4 Neje T i + σ 4 N 2 eie T j + σ 2 NJ(e T j JR0ei)<br />
+σ 4 N 2 eie T j + σ 2 NJ(e T j R c 0Jei) + σ 4 N 2 eie T j + σ 4 Neje T� c<br />
i B<br />
= σ2<br />
2N (B∗ JB c )(e T j JRei)<br />
= σ2<br />
2N (B∗ B ˜ J)(e ∗ jRei)e −iψj<br />
= σ2<br />
2N (B∗ B ˜ J)λiδi,jΓ ∗ ii,<br />
(A.4)
A.2 Asymptotic Covariance of the Residual 179<br />
where the last but one equality follows from R1 <strong>and</strong> R3. The expression<br />
<strong>in</strong> (2.35) for E[zz T ] follows from (A.1) <strong>and</strong> (A.4).<br />
A.2 Asymptotic Covariance of the Residual<br />
In order to compute the asymptotic covariance <strong>in</strong> (3.21), the first order<br />
perturbations <strong>in</strong> the residuals are related to the error <strong>in</strong> the sample covariance<br />
matrix. S<strong>in</strong>ce ˜ R = ˆ R −R = Op(1/ √ N), it is straightforward to<br />
obta<strong>in</strong> the follow<strong>in</strong>g first order relation from (3.4):<br />
B ∗ Ês = B ∗ REs<br />
˜ � Λ −1<br />
+ op(1/ √ N), (A.5)<br />
where B = B(θ0) (see Section 2.3.4 or [CTO89]). Next notice that the<br />
observations are given by the model<br />
x(t) = A(θ0,ρ)s(t) + n(t)<br />
= A0s(t) + n(t) + Ãs(t),<br />
(A.6)<br />
where à = A(θ0,ρ) − A0 denotes the error <strong>in</strong> the steer<strong>in</strong>g matrix due<br />
to ρ. It is important to observe that the first two terms <strong>in</strong> (A.6) represent<br />
the observation from the nom<strong>in</strong>al array, whereas the third term is<br />
due to model errors. The sample covariance matrix can be partitioned<br />
accord<strong>in</strong>gly as<br />
where<br />
ˆRFS = 1<br />
N<br />
ˆRME = 1<br />
N<br />
ˆR = 1<br />
N<br />
N�<br />
x(t)x ∗ (t) = ˆ RFS + ˆ RME, (A.7)<br />
t=1<br />
N� � ��<br />
A0s(t) + n(t) A0s(t) + n(t)<br />
t=1<br />
N�<br />
t=1<br />
�� �<br />
A0s(t) + n(t) s ∗ (t) Ã∗<br />
+ Ãs(t)<br />
�<br />
A0s(t) + n(t)<br />
� ∗<br />
� ∗<br />
, (A.8)<br />
+ Ãs(t)s∗ (t) Ã∗� , (A.9)<br />
<strong>and</strong> the subscripts FS <strong>and</strong> ME st<strong>and</strong> for f<strong>in</strong>ite sample <strong>and</strong> model error,<br />
respectively. The <strong>in</strong>-probability orders of the quantities <strong>in</strong> (A.8)-(A.9)
180 A Proofs <strong>and</strong> Derivations<br />
are<br />
1<br />
N<br />
à = Op(1/ √ N),<br />
N�<br />
s(t)s ∗ (t) − P = Op(1/ √ N),<br />
t=1<br />
1<br />
N<br />
N�<br />
s(t)n ∗ (t) = Op(1/ √ N).<br />
t=1<br />
The first follows from the assumptions on ρ, the second is a st<strong>and</strong>ard<br />
result from ergodicity theory, <strong>and</strong> the last follows from the central limit<br />
theorem (it is a sum of zero-mean <strong>in</strong>dependent r<strong>and</strong>om variables). A first<br />
order approximation of B ∗ ˆ R is then<br />
B ∗ ˆ R = B ∗ ˆ RFS + B ∗ ÃPA ∗ 0 + op(1/ √ N). (A.10)<br />
It is <strong>in</strong>terest<strong>in</strong>g to note that the residuals are, to first order, due to<br />
two <strong>in</strong>dependent terms. The first is the f<strong>in</strong>ite sample errors from the<br />
nom<strong>in</strong>al array <strong>and</strong> the second is the model error contribution. S<strong>in</strong>ce<br />
the terms are <strong>in</strong>dependent, they can be considered one at a time. In<br />
view of (A.7), we denote the error <strong>in</strong> the pr<strong>in</strong>cipal eigenvectors due to<br />
, respectively; that is,<br />
f<strong>in</strong>ite samples <strong>and</strong> model errors by ˜ EsFS <strong>and</strong> ˜ EsME<br />
Ês = Es + ˜ EsFS + ˜ EsME .<br />
Let us start by deriv<strong>in</strong>g the part of the residual covariance matrix due<br />
to f<strong>in</strong>ite samples. Def<strong>in</strong>e ˜ RFS = ˆ RFS−R. By straightforward calculations<br />
it is possible to show that<br />
E{vec( ˜ RFS)vec ∗ ( ˜ RFS)} = 1 � � T<br />
R ⊗ R , (A.11)<br />
N<br />
E{vec( ˜ RFS)vec T ( ˜ RFS)} = 1 � �BT T<br />
vec(R)vec (R) , (A.12)<br />
N<br />
s<strong>in</strong>ce {x(t)} N t=1 is a sequence of <strong>in</strong>dependent circular Gaussian r<strong>and</strong>om<br />
vectors. Here, the notation (·) BT means that every “block” <strong>in</strong> (·) is transposed.<br />
(It should be clear from the context what “block” is.) Recall<strong>in</strong>g
A.2 Asymptotic Covariance of the Residual 181<br />
the first order equality (A.5) <strong>and</strong> mak<strong>in</strong>g use of (A.11), we get<br />
CεFS<br />
� lim<br />
N→∞ N E{vec(B∗ ˜ EsFS )vec∗ (B ∗ ˜ EsFS )}<br />
= lim<br />
N→∞ N E{vec(B∗RFSEs ˜ � Λ −1<br />
)vec ∗ (B ∗ RFSEs<br />
˜ � Λ −1<br />
)}<br />
�<br />
(Es � Λ −1<br />
) T ⊗ B ∗�<br />
= lim<br />
N→∞ N<br />
× E{vec( ˜ RFS)vec ∗ ( ˜ RFS)}<br />
�<br />
= (Es � Λ −1<br />
) T ⊗ B ∗� � � T<br />
R ⊗ R �<br />
(Es � Λ −1<br />
) c ⊗ B<br />
� −1<br />
= �Λ E T s R T E c s � Λ −1<br />
⊗ B ∗ �<br />
RB<br />
� −2<br />
= �Λ Λs ⊗ σ 2 B ∗ �<br />
B .<br />
�<br />
(Es � Λ −1<br />
) c �<br />
⊗ B<br />
�<br />
(A.13)<br />
In the third equality we used the formula <strong>in</strong> (3.8). Similarly, from (A.5)<br />
<strong>and</strong> (A.12), we have<br />
¯CεFS � lim<br />
N→∞ N E{vec(B∗ ˜ EsFS )vecT (B ∗ ˜ EsFS )}<br />
= lim<br />
N→∞ N<br />
�<br />
(Es � Λ −1<br />
) T ⊗ B ∗�<br />
× E{vec( ˜ RFS)vec T ( ˜ �<br />
RFS)} (Es � Λ −1<br />
) ⊗ B c�<br />
�<br />
= (Es � Λ −1<br />
) T ⊗ B ∗� � �BT T<br />
vec(R)vec (R) �<br />
(Es � Λ −1<br />
) ⊗ B c�<br />
�<br />
= (Es � Λ −1<br />
) T ⊗ B ∗� � 2<br />
vec(σ Im)vec T (σ 2 �BT<br />
�<br />
Im) (Es � Λ −1<br />
) ⊗ B c�<br />
�<br />
4<br />
= σ (Es � Λ −1<br />
) T ⊗ B ∗��<br />
B c ⊗ (Es � Λ −1 � �vec(Im)vec )<br />
T (Id ′)�BT<br />
= 0.<br />
The last two equalities follow from property 3 <strong>and</strong> property 2 of the<br />
block-transpose operation derived <strong>in</strong> Appendix A.4.<br />
Remark A.1. The fact that ¯ CεFS = 0 may be more easily realized us<strong>in</strong>g<br />
the asymptotic distribution of the eigenvectors (e.g. [KB86]). It is then<br />
easy to verify that E{B∗êkêT l Bc } = 0, where êk is column k of Ês.<br />
Next, we proceed <strong>and</strong> study the model error contribution to the resid-
182 A Proofs <strong>and</strong> Derivations<br />
ual covariance matrix. Recall (A.10), (A.5), <strong>and</strong> that<br />
We have<br />
where<br />
CεME<br />
vec( Ã) = Dρ˜ρ + op(1/ √ N).<br />
� lim<br />
N→∞ N E{vec(B∗ ˜ EsME )vec∗ (B ∗ ˜ EsME )}<br />
= lim<br />
N→∞ N E{vec(B∗ÃPA ∗ 0Es � Λ −1<br />
)vec ∗ (B ∗ ÃPA ∗ 0Es � Λ −1<br />
)}<br />
= lim<br />
N→∞ N � T T ⊗ B ∗� E{vec( Ã)vec∗ ( Ã)}(Tc ⊗ B)<br />
= � T T ⊗ B ∗� Dρ ¯ ΩD ∗ ρ (T c ⊗ B) ,<br />
T = A †<br />
0 Es<br />
<strong>and</strong> the follow<strong>in</strong>g relation is used (cf. (3.1)):<br />
Similarly,<br />
¯CεME<br />
P = A †<br />
0Es � ΛE ∗ sA † ∗<br />
0 .<br />
� lim<br />
N→∞ N E{vec(B∗ ˜ EsME )vecT (B ∗ ˜ EsME )}<br />
(A.14)<br />
= lim<br />
N→∞ N E{vec(B∗ÃPA ∗ 0Es � Λ −1<br />
)vec T (B ∗ ÃPA ∗ 0Es � Λ −1<br />
)}<br />
= lim<br />
N→∞ N � T T ⊗ B ∗� E{vec( Ã)vecT ( Ã)}(T ⊗ Bc )<br />
= � T T ⊗ B ∗� Dρ ¯ ΩD T ρ (T ⊗ B c ) .<br />
(A.15)<br />
Hence, the asymptotic covariance matrix of the residuals <strong>in</strong> (3.21) is given<br />
by<br />
� �<br />
Cε<br />
¯Cε<br />
C¯ε = ,<br />
with<br />
where CεFS , CεME <strong>and</strong> ¯ CεME<br />
respectively.<br />
¯C c ε C c ε<br />
Cε = CεFS + CεME,<br />
¯Cε = ¯ CεME ,<br />
are given by (A.13), (A.14) <strong>and</strong> (A.15),
A.3 Proof of Theorem 3.2 183<br />
A.3 Proof of Theorem 3.2<br />
Recall the expression (3.31)<br />
V ′<br />
i ≃ 2k ∗ iW¯ε.<br />
Differentiat<strong>in</strong>g with respect to θj <strong>and</strong> lett<strong>in</strong>g N tend to <strong>in</strong>f<strong>in</strong>ity leads to<br />
Hij = 2k ∗ iWkj.<br />
Next, study the (i,j)th element of Q <strong>in</strong> (3.32):<br />
Def<strong>in</strong><strong>in</strong>g the matrix<br />
′<br />
Qij = lim N E{V i V<br />
N→∞ ′<br />
j }<br />
= lim<br />
N→∞ N E{4k∗iW¯ε¯ε ∗ Wkj}<br />
= 4k ∗ iWC¯εWkj.<br />
K = � �<br />
k1 ... kd ,<br />
the matrices H <strong>and</strong> Q can be written <strong>in</strong> compact form as<br />
H = 2K ∗ WK, (A.16)<br />
Q = 4K ∗ WC¯εWK. (A.17)<br />
These expressions are valid for any W satisfy<strong>in</strong>g (3.29). It rema<strong>in</strong>s to<br />
prove that H −1 QH −1 equals NCRBθ for the choice W = WGWSF given<br />
<strong>in</strong> (3.25). Insert<strong>in</strong>g W = C −1<br />
¯ε <strong>in</strong> (A.16) <strong>and</strong> (A.17) gives<br />
H −1 QH −1 = � K ∗ C −1<br />
¯ε K � −1 .<br />
Next, we rewrite ki as<br />
�<br />
∗ vec(BiEs) ki =<br />
vec(BT i Ec � �<br />
∗ vec(BiAT) =<br />
s) vec(BT i AcTc �<br />
)<br />
�<br />
∗ vec(B AiT)<br />
= −<br />
vec(BTAc iTc � ��<br />
T ∗ T ⊗ B<br />
= −<br />
)<br />
� �<br />
vec(Ai)<br />
∗ T T ⊗ B � vec(Ac i )<br />
�<br />
,<br />
<strong>and</strong> hence<br />
K = −<br />
��<br />
T ∗ T ⊗ B � �<br />
Dθ ∗ T T ⊗ B � Dc �<br />
.<br />
θ<br />
(A.18)
184 A Proofs <strong>and</strong> Derivations<br />
In (A.18) we made use of the fact that B ∗ i A = −B∗ Ai, which follows<br />
from B ∗ A = 0. We also rewrite the <strong>in</strong>verse of C¯ε us<strong>in</strong>g the “matrix<br />
<strong>in</strong>version lemma”:<br />
C −1<br />
¯ε = ( ¯ L + ¯ G ¯ G ∗ ) −1 = ¯ L −1 − ¯ L −1 ¯ G � ¯G ∗¯ L −1 ¯ G + I � −1 ¯G ∗¯ L −1 . (A.19)<br />
In view of this, the product K ∗ C −1<br />
¯ε K conta<strong>in</strong>s two terms. In the sequel<br />
we consider these two separately <strong>and</strong> show that they correspond to the<br />
/N <strong>in</strong> (3.15).<br />
two terms <strong>in</strong> the expression for CRB −1<br />
θ<br />
Let us first study<br />
K ∗¯ L −1 K =<br />
� ��<br />
T ∗ T ⊗ B � �<br />
Dθ ��<br />
T ∗ T ⊗ B � �<br />
Dθ ∗ T T ⊗ B � Dc �∗ �<br />
−1 L 0<br />
θ 0 L−c �<br />
∗ T T ⊗ B � Dc θ<br />
�<br />
= 2Re D ∗ �<br />
θ T c σ −2 2<br />
Λ� Λ −1<br />
s T T ⊗ B(B ∗ B) −1 B ∗�<br />
Dθ = 2σ −2 Re{D ∗ θMD θ} = 2σ −2 C.<br />
�<br />
(A.20)<br />
Here, M is def<strong>in</strong>ed <strong>in</strong> (3.16) <strong>and</strong> we used the fact that B(B ∗ B) −1 B ∗ =<br />
Π ⊥ A. Thus, we have shown that (A.20) equals the first term <strong>in</strong> CRB −1<br />
θ /N.<br />
To simplify the second term, we first need some prelim<strong>in</strong>ary calculations.<br />
Consider the <strong>in</strong>verse appear<strong>in</strong>g <strong>in</strong> (A.19),<br />
� ¯G ∗¯ L −1 ¯ G + I � −1<br />
=<br />
�<br />
�G<br />
∗ T G ��L−1 0<br />
0 L−c � �<br />
G<br />
Gc �<br />
�−1 + I<br />
= � 2Re{G ∗ L −1 G} + I �−1 � �<br />
= 2Re ¯Ω 1/2 ∗<br />
Dρ (T c ⊗ B)<br />
�<br />
× σ −2 2<br />
Λ� Λ −1<br />
s ⊗ (B ∗ B) −1� � T ∗<br />
T ⊗ B � Dρ ¯ Ω 1/2�<br />
�−1 + I<br />
= σ2<br />
2 ¯ Ω −1/2<br />
�<br />
Re{D ∗ ρMDρ} + σ2<br />
2 ¯ Ω −1<br />
�−1<br />
¯Ω −1/2<br />
= σ2<br />
2 ¯ Ω −1/2 Γ −1 ¯ Ω −1/2 ,<br />
(A.21)
A.4 Some Properties of the Block Transpose 185<br />
where Γ is def<strong>in</strong>ed <strong>in</strong> (3.18). We also need to compute<br />
K ∗¯ L −1 ¯ G = −<br />
��<br />
T ∗ T ⊗ B � �<br />
Dθ ∗ T T ⊗ B � Dc �∗ �<br />
−1 L 0<br />
θ<br />
0 L−c ��<br />
G<br />
Gc �<br />
�<br />
= −2Re D ∗ �<br />
θ T c σ −2 2<br />
Λ� Λ −1<br />
s T T ⊗ B(B ∗ B) −1 B ∗�<br />
Dρ ¯ Ω 1/2�<br />
= −2σ −2 F T θ ¯ Ω 1/2 ,<br />
(A.22)<br />
where Fθ is def<strong>in</strong>ed <strong>in</strong> (3.17). Comb<strong>in</strong><strong>in</strong>g (A.22), (A.21) <strong>and</strong> (A.19), we<br />
see that the second term <strong>in</strong> the product K ∗ C −1<br />
¯ε K contributes with<br />
−2σ −2 F T θ ¯ Ω 1/2 σ 2<br />
2 ¯ Ω −1/2 Γ −1 ¯ Ω −1/2 ¯Ω 1/2 Fθ2σ −2 = −2σ −2 F T θΓ −1 Fθ<br />
which is exactly equal to the second term <strong>in</strong> CRB −1<br />
θ /N (see (3.15)).<br />
A.4 Some Properties of the Block Transpose<br />
This section conta<strong>in</strong>s three useful properties related to the block transpose<br />
operation. Let Ai×j denote an arbitrary i ×j matrix <strong>and</strong> def<strong>in</strong>e the<br />
operator<br />
Lmn = [vec(Im)vec T (In)] BT<br />
which has dimension mn × mn. Observe that Lmn �= Lnm unless m = n.<br />
Property A.1.<br />
Lmn vec(Am×n) = vec(A T m×n)<br />
Proof. This is easily proven by direct verification.<br />
Property A.2.<br />
Lmn (An×d ⊗ Bm×p) = (Bm×p ⊗ An×d)Lpd
186 A Proofs <strong>and</strong> Derivations<br />
Proof. Let e be an arbitrary dp-vector <strong>and</strong> let E be a p×d matrix def<strong>in</strong>ed<br />
via vec(E) = e. Then<br />
<strong>and</strong> the result follows.<br />
Property A.3.<br />
Lmn (An×d ⊗ Bm×p)e = Lmn vec(BEA T )<br />
� vec(Am×nBn×p)vec T (Cq×rDr×s) �BT<br />
= vec(AE T B T )<br />
= (B ⊗ A) vec(E T )<br />
= (B ⊗ A)Lpd vec(E)<br />
= (Ip ⊗ C) � vec(B)vec T (D) �BT � Is ⊗ A T�<br />
Proof. Let A•i denote column i of A. Consider block ij (i = 1,... ,p;<br />
j = 1,... ,s) of the left h<strong>and</strong> side:<br />
� AB•i(CD•j) T� T = C(B•iD T •j) T A T<br />
which is readily seen to equal block ij of the right h<strong>and</strong> side.<br />
A.5 Proof of Theorem 4.1<br />
It is well known that the CRB is given by the <strong>in</strong>verse of the Fisher<br />
<strong>in</strong>formation matrix (FIM). Element (k,l) of the normalized FIM is given<br />
by Bangs’ formula [Ban71]<br />
1<br />
N FIMkl = Tr{R −1 RkR −1 Rl},<br />
where the notation Rk refers to the partial derivative of R with respect<br />
to the k:th parameter <strong>in</strong> η. Us<strong>in</strong>g the formula Tr{ABCD} =<br />
vec ∗ (B ∗ ) � A T ⊗ C � vec(D) we get<br />
1<br />
N FIMkl = vec ∗ (Rk) � R −T ⊗ R −1� vec(Rl) .<br />
The covariance matrix is given by<br />
R(η) = A(θ)PA ∗ (θ) + σ 2 I =<br />
d�<br />
a(θi)pia ∗ (θi) + σ 2 I<br />
i=1
A.5 Proof of Theorem 4.1 187<br />
<strong>and</strong>, hence, the partial derivatives are<br />
∂R<br />
Rθi �<br />
∂θi<br />
∂R<br />
Rpi � = aia ∂pi<br />
∗ i ,<br />
Rσ2 � ∂R<br />
= I ,<br />
∂σ2 where the notation a i = a(θi) <strong>and</strong><br />
= (d ia ∗ i + a id ∗ i )pi ,<br />
d i = ∂a i<br />
∂θi<br />
has been <strong>in</strong>troduced. Now, def<strong>in</strong>e the matrices<br />
<strong>and</strong> also<br />
D θ = � vec(Rθ1 ) · · · vec(Rθd )� = DP , (A.23)<br />
D p = � vec(Rp1 ) · · · vec(Rpd )� = (A c ◦ A) = B ,<br />
Dσ2 = vec(Rσ2) = vec(I) � i ,<br />
Dn = � �<br />
Dp Dσ2 , (A.24)<br />
WR = � R −T ⊗ R −1� ,<br />
X = D ∗ θWRD θ ,<br />
Y = D ∗ θWRD n ,<br />
Z = D ∗ nWRD n ,<br />
where the matrix D <strong>in</strong> (A.23) is def<strong>in</strong>ed <strong>in</strong> (4.15). With the notation<br />
<strong>in</strong>troduced above, note that the Fisher <strong>in</strong>formation matrix reads<br />
�<br />
X Y<br />
FIM =<br />
Y∗ �<br />
.<br />
Z<br />
The bound on the DOAs is given by the upper left d × d corner of the<br />
CRB matrix; that is, the upper left corner of the <strong>in</strong>verse FIM. Us<strong>in</strong>g the<br />
formulas for <strong>in</strong>vert<strong>in</strong>g partitioned matrices, the follow<strong>in</strong>g expression is<br />
obta<strong>in</strong>ed<br />
CRB −1<br />
θ = X − YZ−1 Y ∗<br />
= D ∗ θWRD θ − D ∗ θWRD n(D ∗ nWRD n) −1 D ∗ nWRD θ<br />
= D ∗ θW 1/2<br />
R Π⊥<br />
W 1/2<br />
R D W<br />
n<br />
1/2<br />
R Dθ. (A.25)
188 A Proofs <strong>and</strong> Derivations<br />
In what follows, the expression for CRB −1<br />
θ is rewritten <strong>in</strong> a form suitable<br />
for establish<strong>in</strong>g the equality of the asymptotic estimation error covariance<br />
of the proposed estimator <strong>and</strong> CRBθ.<br />
Recall from (A.24) that W 1/2<br />
tion decomposition theorem to write<br />
where<br />
R Dn = W 1/2<br />
R<br />
Π ⊥<br />
W 1/2<br />
R D = Π<br />
n<br />
⊥ BW − ΠΠ⊥ B<br />
iW<br />
W<br />
BW = W 1/2<br />
R B ,<br />
iW = W 1/2<br />
R i .<br />
� B i � <strong>and</strong> use the projec-<br />
, (A.26)<br />
Let the columns of G form a basis for the null-space of B ∗ (this matter<br />
is further dwelled upon <strong>in</strong> Section 4.6). Observe that<br />
Π ⊥ BW = ΠW −1/2<br />
R G. (A.27)<br />
Us<strong>in</strong>g (A.26) <strong>and</strong> (A.27), the expression (A.25) can be rewritten to yield<br />
CRB −1<br />
θ = D∗ θW 1/2<br />
R<br />
�<br />
Π W −1/2<br />
R G − ΠΠ W −1/2<br />
R GiW<br />
�<br />
W 1/2<br />
R D θ<br />
= D ∗ θG(G ∗ W −1<br />
R G)−1 G ∗ D θ<br />
− D∗ θ G(G∗ W −1<br />
R G)−1 G ∗ ii ∗ G(G ∗ W −1<br />
R G)−1 G ∗ D θ<br />
i ∗ G(G ∗ W −1<br />
R G)−1 G ∗ i<br />
Next it is proven that<br />
<strong>and</strong> that<br />
i ∗ G(G ∗ W −1<br />
R G)−1 G ∗ i<br />
= m − d<br />
Start<strong>in</strong>g with (A.30):<br />
. (A.28)<br />
D ∗ θG(G ∗ W −1<br />
R G)−1 G ∗ vec(Π ⊥ A) = 0 (A.29)<br />
σ 4 + vec∗ (Π A)G(G ∗ W −1<br />
R G)−1 G ∗ vec(Π A) . (A.30)<br />
i ∗ G(G ∗ W −1<br />
R G)−1G ∗ i = i ∗ W 1/2<br />
R ΠW −1/2<br />
R GW1/2 R i<br />
= i ∗ W 1/2<br />
R Π⊥<br />
W 1/2<br />
R BW1/2 R i<br />
= i ∗ WRi − i ∗ WRB(B ∗ WRB) −1 B ∗ WRi .<br />
(A.31)
A.5 Proof of Theorem 4.1 189<br />
Observe that i = vec(Π A) + vec(Π ⊥ A) <strong>and</strong><br />
B ∗ WR vec(Π ⊥ A) = B ∗ vec(R −1 Π ⊥ AR −1 )<br />
Us<strong>in</strong>g (A.32) <strong>in</strong> (A.31) leads to<br />
= L ∗ � A T ⊗ A ∗� vec(Π ⊥ A)σ −4 = 0 . (A.32)<br />
i ∗ WRi − vec ∗ (Π A)WRB(B ∗ WRB) −1 B ∗ WR vec(Π A)<br />
= i ∗ WRi − vec ∗ (ΠA)W 1/2<br />
R Π⊥<br />
W −1/2<br />
R GW1/2 R vec(ΠA) = i ∗ WRi − vec ∗ (ΠA)WR vec(ΠA) + vec ∗ (Π A)G(G ∗ W −1<br />
R G)−1 G ∗ vec(Π A)<br />
= Tr{R −2 − R −1 Π AR −1 Π A}<br />
+ vec ∗ (ΠA)G(G ∗ W −1<br />
R G)−1G ∗ vec(ΠA) = m − d<br />
σ4 + vec∗ (ΠA)G(G ∗ W −1<br />
R G)−1G ∗ vec(ΠA) which is (A.30). Consider next (A.29):<br />
D ∗ θG(G ∗ W −1<br />
R G)−1 G ∗ vec(Π ⊥ A)<br />
= D ∗ θW 1/2<br />
R Π⊥ BW W1/2<br />
R vec(Π⊥ A)<br />
= D ∗ θ WR vec(Π ⊥ A)<br />
� �� �<br />
σ −4 vec(Π ⊥ A )<br />
−D ∗ θWRB(B ∗ WRB) −1 B ∗ WR vec(Π ⊥ A)<br />
� �� �<br />
=0, see (A.32)<br />
= σ −4 PL ∗ �� ¯ D T ⊗ A ∗ � + � A T ⊗ ¯ D ∗�� vec(Π ⊥ A) = 0,<br />
where ¯ D is def<strong>in</strong>ed <strong>in</strong> (4.16).<br />
Mak<strong>in</strong>g use of (A.29) <strong>and</strong> (A.30) <strong>in</strong> (A.28) yields<br />
CRB −1<br />
θ<br />
= D ∗ θG<br />
�<br />
(G ∗ W −1<br />
R G)−1<br />
R G)−1<br />
− (G∗W −1<br />
R G)−1G∗ vec(ΠA)vec∗ (ΠA)G(G∗W −1<br />
m−d<br />
σ4 + vec∗ (ΠA)G(G∗W −1<br />
R G)−1G∗ vec(ΠA) = D ∗ �<br />
θG G ∗ W −1 σ4<br />
R G +<br />
m − d G∗ vec(ΠA)vec ∗ (ΠA)G = D ∗ �<br />
θG G ∗<br />
�<br />
W −1 σ4<br />
R +<br />
m − d vec(ΠA)vec ∗ �<br />
(ΠA) G<br />
�<br />
G ∗ D θ<br />
�−1<br />
G ∗ Dθ �−1<br />
G ∗ D θ,
190 A Proofs <strong>and</strong> Derivations<br />
where the matrix <strong>in</strong>version lemma has been used to obta<strong>in</strong> the last but<br />
one equality. This concludes the proof.<br />
A.6 Rank of B(θ)<br />
In this appendix it is shown that rank{B(θ)} = d as long as m > d/2.<br />
Let θ = [θ1,... ,θd] T , <strong>and</strong> consider the follow<strong>in</strong>g general form of the<br />
steer<strong>in</strong>g matrix<br />
⎡<br />
⎤<br />
1 ... 1<br />
⎢ f1(θ1) f1(θd) ⎥<br />
⎢<br />
A(θ) = ⎢<br />
⎣<br />
.<br />
.<br />
.<br />
fm−1(θ1) fm−1(θd)<br />
where, without loss of generality, the response from the first sensor is normalized<br />
to one. The array is assumed to be unambiguous which implies<br />
that A has full rank for d = m <strong>and</strong> θi �= θj,i �= j. It follows from (4.6)<br />
that B(θ) can be written as<br />
⎡<br />
B(θ) = (A c ⎢<br />
◦ A) = ⎢<br />
⎣<br />
A<br />
AΦ1<br />
.<br />
AΦm−1<br />
⎤<br />
⎥<br />
⎦<br />
⎥ , (A.33)<br />
⎦<br />
where Φi = diag[f c i (θ1) ... f c i (θd)]. The notation diag[v] refers to a<br />
matrix with the entries <strong>in</strong> the vector v as diagonal elements. From (A.33)<br />
it is obvious that rank{B} = d if d ≤ m s<strong>in</strong>ce A is full rank. Let us next<br />
study (non-zero) solutions x ∈ C d×1 to B(θ)x = 0. Any such vector<br />
must satisfy<br />
Ax = 0<br />
AΦ1x = 0<br />
AΦm−1x = 0 .<br />
. (A.34)<br />
A necessary condition for the existence of a solution x �= 0 is d > m.<br />
It is also important to notice that at least m + 1 elements <strong>in</strong> x have to<br />
be different from zeros s<strong>in</strong>ce the columns of A are l<strong>in</strong>early <strong>in</strong>dependent.
A.7 Derivation of the Asymptotic Distribution 191<br />
S<strong>in</strong>ce the dimension of the null-space of A is d−m, there are no solutions<br />
to (A.34) if<br />
rank{x Φ1x · · · Φm−1x} > d − m . (A.35)<br />
However, it is straightforward to verify that the matrix <strong>in</strong> (A.35) can<br />
be written as diag{x}A ∗ . The rank of diag{x}A ∗ is m s<strong>in</strong>ce d > m <strong>and</strong><br />
s<strong>in</strong>ce x has at least m+1 elements different from zero. Hence, no solution<br />
x �= 0 exists if m > d − m or equivalently m > d/2 <strong>and</strong> we conclude that<br />
rank{B(θ)} = d if m > d/2.<br />
A.7 Derivation of the Asymptotic Distribution<br />
We will start with the derivation of Q def<strong>in</strong>ed <strong>in</strong> (4.24). From (4.19) <strong>and</strong><br />
the fact that B † f = p, we have<br />
V ′<br />
k = −2Re{p ∗ B ∗ kΠ ⊥ BWΠ ⊥ B ˜ f} = −2Re{u ∗ k ˜ f},<br />
where uk is recognized as the k:th column of U <strong>in</strong> (4.27). From (4.24) it<br />
is evident that element (k,l) of Q is given by<br />
Qkl = lim<br />
N→∞ 4N E{Re{u∗ k ˜ f}Re{u ∗ l ˜ f}}<br />
= lim<br />
N→∞ 2N E{Re{u∗ k ˜ f ˜ f ∗ u l + u ∗ k ˜ f ˜ f T u c l }}<br />
= 2Re{u ∗ kCu l + u ∗ k ¯ Cu c l },<br />
which is element (k,l) of the expression <strong>in</strong> (4.26). Next, consider H as<br />
def<strong>in</strong>ed <strong>in</strong> (4.18). From (4.19) we get, for large N,<br />
V ′′<br />
kl = −2Re{p ∗ B klΠ ⊥ BWΠ ⊥ B ˆ f} − 2Re{p ∗ B ∗ k{Π ⊥ B}lWΠ ⊥ B ˆ f}<br />
− 2Re{p ∗ B ∗ kΠ ⊥ BW{Π ⊥ B}l ˆ f},<br />
<strong>and</strong> us<strong>in</strong>g (4.20) we obta<strong>in</strong><br />
Hkl = 2Re{p ∗ B ∗ kΠ ⊥ BWΠ ⊥ BB l p},<br />
which is element (k,l) of the matrix <strong>in</strong> (4.25).<br />
It rema<strong>in</strong>s to compute the covariance matrices C <strong>and</strong> ¯ C def<strong>in</strong>ed <strong>in</strong><br />
(4.28) <strong>and</strong> (4.29), respectively. S<strong>in</strong>ce {x(t)} is assumed to be zero-mean
192 A Proofs <strong>and</strong> Derivations<br />
Gaussian <strong>and</strong> temporally white, it can be shown that<br />
E{ ˆ Rkl ˆ Rpq} = RklRpq + 1<br />
N RkqRpl. (A.36)<br />
From (A.36), the follow<strong>in</strong>g two expressions are readily checked:<br />
E{vec( ˜ R)vec ∗ ( ˜ R)} = 1 � � T<br />
R ⊗ R , (A.37)<br />
N<br />
E{vec( ˜ R)vec T ( ˜ R)} = 1 � �BT T<br />
vec(R)vec (R) , (A.38)<br />
N<br />
where the superscript ’BT’ denotes block-transpose, which means that<br />
each m × m block is transposed. By mak<strong>in</strong>g use of (4.23), (A.37) <strong>and</strong><br />
(A.38), it is straightforward to derive C <strong>and</strong> ¯ C:<br />
C = lim<br />
N→∞ N E{Mvec( ˜ R)vec ∗ ( ˜ R)M ∗ }<br />
= M � R T ⊗ R � M ∗ , (A.39)<br />
¯C = lim<br />
N→∞ E{Mvec( ˜ R)vec T ( ˜ R)M T }<br />
= M � vec(R)vec T (R) �BT M T . (A.40)<br />
Insert<strong>in</strong>g the def<strong>in</strong>ition of M given <strong>in</strong> (4.23) <strong>in</strong>to (A.39) leads after some<br />
tedious calculations to the follow<strong>in</strong>g expression for C:<br />
�<br />
C = � R T ⊗ R � �<br />
4<br />
− σ Π ⊥T<br />
A ⊗ Π ⊥ A<br />
+ σ4<br />
m − d vec(Π A)vec ∗ (Π A) .<br />
Similarly, we can obta<strong>in</strong> an expression for ¯ C by <strong>in</strong>sert<strong>in</strong>g M <strong>in</strong>to (A.40).<br />
Alternatively, one can use the follow<strong>in</strong>g observation. Let Lm = L T m be<br />
the permutation matrix such that<br />
vec(X T ) = Lm vec(X) (A.41)<br />
for any (m × m) matrix X [Gra81]. It is then easy to see that f can be<br />
related to its conjugate f c as<br />
f c = vec � [EsΛE ∗ s] c� = vec � [EsΛE ∗ s] T� = Lmf. (A.42)<br />
It is obvious that a similar relation holds for any vector def<strong>in</strong>ed as vec(X)<br />
when X is Hermitian. This implies that<br />
¯C = lim<br />
N→∞ N E{˜ f ˜ f T } = lim<br />
N→∞ N E{˜ f ˜ f ∗ Lm} = CLm.<br />
Hence, ¯ C is related to C through a column permutation.
A.8 Proof of Theorem 4.3 193<br />
Remark A.2. Observe that the matrix C (<strong>and</strong> ¯ C) is s<strong>in</strong>gular. The nullspace<br />
of C is spanned by the columns of (E c n ⊗ En). The dimension of<br />
the null-space is (m − d) 2 <strong>and</strong> consequently the dimension of the rangespace<br />
of C is m 2 − (m − d) 2 = 2md − d 2 . Notice that this equals the<br />
number of real parameters <strong>in</strong> the Cholesky-factorization of APA ∗ .<br />
A.8 Proof of Theorem 4.3<br />
Recall<strong>in</strong>g the def<strong>in</strong>ition of B <strong>in</strong> (4.6) <strong>and</strong> mak<strong>in</strong>g use of (A.41) leads to<br />
which <strong>in</strong> turn implies that<br />
LmB k = B c k ,<br />
B c = LmB,<br />
B † Lm = B †c ,<br />
LmΠ ⊥ B = Π ⊥c<br />
B Lm = Π ⊥T<br />
B Lm .<br />
In what follows, we will prove that the two terms <strong>in</strong> the gradient (4.19) are<br />
real valued if the weight<strong>in</strong>g matrix has a special form satisfied by (4.30).<br />
Recall (A.42) <strong>and</strong> consider the first term <strong>in</strong> (4.19),<br />
ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f = ˆ f T Lm{Π ⊥ B}kWLm ˆ f c = ˆ f T {Π ⊥ B} c kLmWLmΠ ⊥c<br />
B ˆ f c .<br />
If LmWLm = W c = W T , then the two terms <strong>in</strong> (4.19) are real valued<br />
<strong>and</strong> we have<br />
This implies that<br />
V ′<br />
k = 2 ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f.<br />
H = 2PD ∗ Π ⊥ BWΠ ⊥ BDP, (A.43)<br />
Q = 4U ∗ CU, (A.44)<br />
where U is def<strong>in</strong>ed <strong>in</strong> (4.27). This is, as mentioned above, only valid if<br />
LmWLm = W c . We will now prove that this is true for the optimal
194 A Proofs <strong>and</strong> Derivations<br />
weight<strong>in</strong>g Wopt. This can be shown as follows:<br />
LmWoptLm = Lm<br />
It can be shown that<br />
�<br />
=<br />
=<br />
�<br />
Lm<br />
Π ⊥ B ˜ CΠ ⊥ B + BB ∗� −1<br />
Lm<br />
�<br />
Π ⊥ B ˜ CΠ ⊥ B + BB ∗�<br />
Lm<br />
� −1<br />
�<br />
Π ⊥c<br />
B Lm ˜ CLmΠ ⊥c<br />
B Lm + B c B T� −1<br />
. (A.45)<br />
Lm (X ⊗ Y) = (Y ⊗ X)Lm<br />
for any arbitrary (m × m) matrices X <strong>and</strong> Y [Gra81] (see also Section<br />
A.4). Hence, we get<br />
Lm ˜ CLm = ˜ C c = ˜ C T<br />
which, <strong>in</strong> view of (A.45), proves that LmWoptLm = W c opt.<br />
Next, we show that Q = 2H for W = Wopt. In order to prove this,<br />
the follow<strong>in</strong>g calculations are needed. Let G be a m 2 × (m 2 − d) matrix<br />
whose columns span the null-space of B ∗ , then (cf. Lemma 3.1)<br />
�<br />
� �−1 � � ∗<br />
−1 G<br />
G B = G B<br />
B∗ �−1 �<br />
∗ G<br />
B∗ � ��<br />
∗ G<br />
=<br />
B∗ �<br />
�G �<br />
B �−1 �<br />
∗ G<br />
B∗ �<br />
�<br />
∗ −1 (G G) 0<br />
=<br />
0 (B∗B) −1<br />
��<br />
∗ G<br />
B∗ � �<br />
† G<br />
=<br />
B †<br />
�<br />
.<br />
Us<strong>in</strong>g this result <strong>in</strong> (4.30) <strong>and</strong> observ<strong>in</strong>g that Π ⊥ B = Π G gives<br />
�<br />
Wopt = Π ˜<br />
GCΠG + BB ∗� �<br />
−1 �G �<br />
= B � G † ��<br />
CG ˜ †∗ ∗<br />
0 G<br />
0 I B∗ = � G †∗ B †∗� � (G † � �<br />
CG ˜ †∗ −1 †<br />
) 0 G<br />
0 I B †<br />
�<br />
,<br />
<strong>and</strong><br />
Π ⊥ BWoptΠ ⊥ B = Π G<br />
� G †∗ B †∗ � � (G † ˜ CG †∗ ) −1 0<br />
0 I<br />
� � G †<br />
B †<br />
�<br />
Π G<br />
��−1<br />
= G †∗ (G † ˜ CG †∗ ) −1 G † = G(G ∗ ˜ CG) −1 G ∗ . (A.46)
A.8 Proof of Theorem 4.3 195<br />
From Remark A.2 <strong>in</strong> the end of Appendix A.7, we know that the rank of<br />
C is 2md −d 2 . Let the m 2 ×(2md −d 2 ) matrix T be the Cholesky factor<br />
of C <strong>and</strong> def<strong>in</strong>e E = σ 2 (E c n ⊗ En). Recall that the columns of E span the<br />
null-space of C (see Remark A.2). Let R(·) <strong>and</strong> N(·) denote the range<strong>and</strong><br />
the null-space of a matrix, respectively. Study the decompositions<br />
C m2<br />
= R(T)<br />
� �� �<br />
2md−d2 ⊕ N(T ∗ )<br />
� �� �<br />
(m−d) 2<br />
= R(G) ⊕ N(G ∗ ) ,<br />
� �� �<br />
d<br />
� �� �<br />
m 2 −d<br />
where the dimension of each of the subspaces is <strong>in</strong>dicated, <strong>and</strong> where ⊕<br />
denotes direct sum. S<strong>in</strong>ce it is known that N(T ∗ ) = N(C) = N(E ∗ ) ⊂<br />
R(G), it follows that N(G ∗ ) ⊂ R(T). This means that a d-dimensional<br />
subspace of R(T) forms N(G ∗ ). Def<strong>in</strong>e the m 2 × (2md − d 2 − d) matrix<br />
¯T via<br />
G ∗ CG = G ∗ TT ∗ G = G ∗ ¯ T ¯ T ∗ G . (A.47)<br />
Notice that ˜ C = C + EE ∗ <strong>and</strong>, hence,<br />
G ∗ ˜ CG = G ∗ � ¯ T E �� ¯T E � ∗ G = G ∗ � ¯ T E B � � ¯T E B � ∗ G .<br />
For simplicity, <strong>in</strong>troduce the notation<br />
F = � ¯ T E B � .<br />
It is important to realize that the blocks <strong>in</strong> F are mutually orthogonal<br />
<strong>and</strong> that R( � ¯ T E � ) = R(G). This implies <strong>in</strong> particular that<br />
<strong>and</strong><br />
F −1 ⎡<br />
¯T<br />
= ⎣<br />
†<br />
⎤<br />
⎦<br />
E †<br />
B †<br />
ΠF∗ ⎡<br />
I2md−d2−d 0 0<br />
G = ⎣<br />
0 I (m−d) 2 0<br />
0 0 0d×d<br />
⎤<br />
⎦ .<br />
The fact that R( � ¯ T E � ) = R(G) also proves that G ∗ ˜ CG is <strong>in</strong>vertible.
196 A Proofs <strong>and</strong> Derivations<br />
Us<strong>in</strong>g the observations made above we obta<strong>in</strong><br />
D ∗ G(G ∗ ˜ CG) −1 G ∗<br />
1<br />
= D ∗ F −∗ F ∗ G(G ∗ FF ∗ G) −1 G ∗ FF −1 2 ∗ −∗<br />
= D F ΠF∗GF −1<br />
3<br />
= D ∗ � T ¯ †∗ †∗ †∗ E B �<br />
⎡<br />
⎤ ⎡<br />
I2md−d2−d 0 0 ¯T<br />
⎣ 0 I (m−d) 2 0⎦<br />
⎣<br />
0 0 0<br />
†<br />
E †<br />
B †<br />
⎤<br />
⎦<br />
4<br />
= D ∗ � T ¯ †∗ †∗ 0 B �<br />
⎡<br />
¯T<br />
⎣<br />
†<br />
E †<br />
⎤<br />
⎦<br />
0<br />
5 = D ∗ T ¯ †∗<br />
T ¯ † 6 ∗<br />
= D ( T¯ T¯ ∗ †<br />
) .<br />
(A.48)<br />
In the fourth equality, the fact that D ∗ E = 0 was used <strong>and</strong> the sixth<br />
equality follows from a property of pseudo<strong>in</strong>verses for full rank matrices<br />
(see Lemma A.1 <strong>in</strong> Appendix A.11). Now, comb<strong>in</strong><strong>in</strong>g (A.46), (A.47)<br />
<strong>and</strong> (A.48) <strong>in</strong> the equations (A.43) <strong>and</strong> (A.44) we get the desired result;<br />
that is,<br />
H = 2PD ∗ Π ⊥ BWoptΠ ⊥ BDP<br />
= 2PD ∗ G(G ∗ ˜ CG) −1 G ∗ DP ,<br />
Q = 4U ∗ CU = 4PD ∗ Π ⊥ BWoptΠ ⊥ BCΠ ⊥ BWoptΠ ⊥ BDP<br />
= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ CG(G ∗ ˜ CG) −1 G ∗ DP<br />
= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ ¯ T ¯ T ∗ G(G ∗ ˜ CG) −1 G ∗ DP<br />
= 4PD ∗ ( ¯ T ¯ T ∗ ) † ¯ T ¯ T ∗ ( ¯ T ¯ T ∗ ) † DP<br />
= 4PD ∗ ( ¯ T ¯ T ∗ ) † DP<br />
= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ DP<br />
<strong>and</strong>, hence, Q = 2H.<br />
We are now ready to prove Theorem 4.3. It follows from the above<br />
that<br />
Γ = H −1 QH −1 �<br />
= PD ∗ G(G ∗ �−1 CG) ˜ −1 ∗<br />
G DP (A.49)<br />
when W = Wopt is used. The expression <strong>in</strong> (A.49) is recognized as the<br />
CRB <strong>in</strong> Theorem 4.1. This concludes the proof that the weight<strong>in</strong>g matrix<br />
<strong>in</strong> (4.30) gives m<strong>in</strong>imum variance performance <strong>in</strong> large samples.
A.9 Proof of Expression (5.22) 197<br />
A.9 Proof of Expression (5.22)<br />
In this appendix it is shown that the solution to (5.21) is given by (5.22).<br />
Insert<strong>in</strong>g (5.20) <strong>in</strong> (5.21) we may equivalently write<br />
max<br />
¯K<br />
Tr � W 1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T αW 1/2� .<br />
The trick is now to change variables. Let<br />
(A.50)<br />
¯K = ¯ T T E T (PβΠ ⊥ αP T β ) −1/2 , (A.51)<br />
where ¯ T is an n × n matrix of full rank <strong>and</strong> E is an (m + l)β × n matrix<br />
with orthonormal columns. Us<strong>in</strong>g the SVD (5.23) <strong>and</strong> (A.51) <strong>in</strong> (A.50)<br />
we have<br />
max<br />
E<br />
E T E=I<br />
Tr � Qˆ Sˆ Vˆ T T<br />
EE Vˆ Sˆ Qˆ T� = max<br />
E<br />
E T Tr<br />
E=I<br />
� E T �<br />
Vˆ Sˆ 2<br />
Vˆ T<br />
E<br />
which clearly is solved by E = ˆ Vs. Insert<strong>in</strong>g this solution <strong>in</strong> (A.51) gives<br />
(5.22).<br />
A.10 Proof of Expression (5.27)<br />
In this appendix we show that the solution to (5.25) is given by (5.27).<br />
Us<strong>in</strong>g (5.20), the m<strong>in</strong>imization problem reads<br />
m<strong>in</strong><br />
¯K<br />
= m<strong>in</strong><br />
¯K<br />
= m<strong>in</strong><br />
¯K<br />
= m<strong>in</strong><br />
¯K<br />
det � Θ �<br />
det � YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 KPβΠ ¯ ⊥ αY T� α<br />
det � I − (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />
n�<br />
(1 − λk) ,<br />
k=1<br />
where {λk} n<br />
k=1 are the nonzero eigenvalues of<br />
× ¯ KPβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2�<br />
M = (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />
(A.52)<br />
× ¯ KPβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2 . (A.53)
198 A Proofs <strong>and</strong> Derivations<br />
From the follow<strong>in</strong>g series of implications<br />
Θ ≥ 0 ⇒ I − M ≥ 0 ⇒ 1 − λk ≥ 0, k = 1, ..., n,<br />
it is clear that the m<strong>in</strong>imum of (A.52) is atta<strong>in</strong>ed if we can f<strong>in</strong>d a ¯ K<br />
which maximizes all n nonzero eigenvalues of M. Next it is shown that<br />
there <strong>in</strong>deed exists such a ¯ K.<br />
If the variable ¯ K is changed accord<strong>in</strong>g to (A.51) the expression (A.53)<br />
reads<br />
M = (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 E<br />
× E T (PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2 .<br />
The n nonzero eigenvalues of M are then equivalently given by the eigenvalues<br />
of<br />
E T (PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α<br />
× (YαΠ ⊥ αY T α) −1 (YαΠ ⊥ αP T β )(PβΠ ⊥ αP T β ) −1/2 E ,<br />
which by Po<strong>in</strong>caré’s separation theorem are bounded from above by the<br />
n largest eigenvalues of (see, e.g., [Bel70])<br />
(PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1 (YαΠ ⊥ αP T β )(PβΠ ⊥ αP T β ) −1/2 .<br />
(A.54)<br />
The bound is atta<strong>in</strong>ed if E is taken as the eigenvectors correspond<strong>in</strong>g to<br />
the n largest eigenvalues of (A.54). S<strong>in</strong>ce M is positive semi def<strong>in</strong>ite <strong>and</strong><br />
symmetric, the eigendecomposition of M is given by ˆ V ˆ S 2 ˆ V T where ˆ V<br />
<strong>and</strong> ˆ S are def<strong>in</strong>ed through the SVD (5.26). Hence, the upper bound on<br />
the eigenvalues of M is atta<strong>in</strong>ed if E = ˆ Vs. The sought ¯ K <strong>in</strong> (A.52) is<br />
thus given by (A.51) with E = ˆ Vs, which is (5.27).<br />
A.11 Proof of Theorem 5.1<br />
Let us first <strong>in</strong>troduce the follow<strong>in</strong>g lemma.<br />
Lemma A.1. Let A be m × r <strong>and</strong> B r × n, both of rank r, then<br />
(AB) † = B † A † . (A.55)
A.11 Proof of Theorem 5.1 199<br />
Proof. It is straightforward to verify that the expression <strong>in</strong> (A.55) satisfies<br />
the Moore-Penrose conditions. For a reference, see [Alb72].<br />
Consider the pole estimates from (5.30)-(5.31) where ˆ Γα is given by<br />
(5.37). Notice that the analysis below is valid even if the right h<strong>and</strong> side of<br />
(5.37) is multiplied from the right with any full rank matrix s<strong>in</strong>ce this does<br />
not alter the pole estimates. Any such multiplication just corresponds<br />
to a similarity transformation <strong>and</strong> does not affect the eigenvalues of Â.<br />
The error <strong>in</strong> the estimation of A may be written<br />
�<br />
à =  − A = J1 ˆ � † �<br />
Γα J2 ˆ Γα − J1 ˆ �<br />
ΓαA<br />
�<br />
†<br />
≃ (J1Γα) J2 ˆ Γα − J1 ˆ �<br />
ΓαA ,<br />
(A.56)<br />
where ≃ denotes first order approximation. Assume that A has dist<strong>in</strong>ct<br />
eigenvalues <strong>and</strong> thus can be diagonalized as<br />
A = T −1 AdT,<br />
where Ad is a diagonal matrix with the poles, {µk} n k=1 , on the diagonal<br />
<strong>and</strong> T−1 conta<strong>in</strong>s the correspond<strong>in</strong>g right eigenvectors of A. It can be<br />
shown that errors <strong>in</strong> the system matrix transforms to first order perturbations<br />
of the eigenvalues as<br />
˜µk ≃ Tk• ÃT−1<br />
•k<br />
see, e.g., [Bel70]. Insert<strong>in</strong>g (A.56) <strong>in</strong> (A.57) yields<br />
�<br />
†<br />
˜µk ≃ Tk• (J1Γα) J2 ˆ Γα − J1 ˆ �<br />
ΓαA<br />
; (A.57)<br />
T −1<br />
•k<br />
= Tk• (J1Γα) † (J2 − J1µk) ˆ ΓαT −1<br />
•k .<br />
(A.58)<br />
Note that ΓαT −1 = Γ d α is the extended observability matrix of the diagonal<br />
realization. Hence,<br />
T (J1Γα) † = � J1ΓαT −1� † =<br />
�<br />
J1Γ d � †<br />
α , (A.59)<br />
where Lemma A.1 has been used <strong>in</strong> the first equality. S<strong>in</strong>ce Γα =<br />
W −1/2 Qs, we have<br />
W −1/2 QsT −1 = Γ d α
200 A Proofs <strong>and</strong> Derivations<br />
which gives an expression for T −1<br />
Us<strong>in</strong>g (A.59) <strong>and</strong> (A.60) <strong>in</strong> (A.58) leads to<br />
where f ∗ k<br />
˜µk ≃<br />
�<br />
J1Γ d α<br />
� †<br />
T −1 = Q T s W 1/2 Γ d α . (A.60)<br />
(J2 − J1µk)<br />
k•<br />
ˆ ΓαQ T s W 1/2 (Γ d α) •k<br />
= f ∗ kW −1/2 QsQ ˆ T s W 1/2 (Γ d α) •k ,<br />
is def<strong>in</strong>ed <strong>in</strong> (5.44). Observe that<br />
f ∗ kΓ d �<br />
α = J1Γ d � † �<br />
α J2Γ<br />
k•<br />
d α − J1Γ d �<br />
αµk<br />
�<br />
= J1Γ d � † �<br />
α J1Γ d αAd − J1Γ d �<br />
αµk<br />
k•<br />
= 0 .<br />
(A.61)<br />
Thus, f ∗ k is orthogonal to Γd α, Γα <strong>and</strong> W −1/2 Qs. Viberg, Ottersten,<br />
Wahlberg <strong>and</strong> Ljung (see [VWO97] <strong>and</strong> also [VOWL91, VOWL93]) have<br />
shown that √ Nf ∗ k W−1/2 ˆ Qs has a limit<strong>in</strong>g zero-mean Gaussian distribution<br />
with covariances 1<br />
lim<br />
N→∞ N E� (f ∗ kW −1/2 Qs) ˆ T (f ∗ l W −1/2 Qs) ˆ �<br />
= �<br />
|τ|
A.12 Proof of Theorem 5.3 201<br />
where<br />
bk = HQ T s W 1/2 (Γ d α) •k . (A.64)<br />
Insert<strong>in</strong>g the def<strong>in</strong>ition of H (A.63) <strong>in</strong> (A.64) gives<br />
bk = RHVsS −1<br />
s Q T s W 1/2 (Γ d α) •k . (A.65)<br />
Next, recall that QsSsV T s = limN→∞ ˆ Q ˆ S ˆ V T is given by the limit <strong>in</strong><br />
(5.42). This implies that<br />
VsS −1<br />
s Q T �<br />
s = W 1/2 Γ d αR d� † �<br />
d†<br />
= R W 1/2 Γ d � †<br />
α , (A.66)<br />
where Lemma A.1 has been used <strong>in</strong> the last equality. Insert<strong>in</strong>g (A.66) <strong>in</strong><br />
(A.65) gives<br />
�<br />
d†<br />
bk = RHR W 1/2 Γ d � †<br />
α W 1/2 (Γ d α) •k = RHR d†<br />
•k<br />
(A.67)<br />
which is exactly equal to (Ωd) •k as def<strong>in</strong>ed <strong>in</strong> (5.45). We have thus proven<br />
Theorem 5.1.<br />
A.12 Proof of Theorem 5.3<br />
The proof of Theorem 5.3 is divided <strong>in</strong>to three parts. In part one we<br />
prove (5.51). In part two we show that (5.52) is the optimal choice of the<br />
weight<strong>in</strong>g Λ. F<strong>in</strong>ally, <strong>in</strong> part three the calculations lead<strong>in</strong>g to (5.53) are<br />
given.<br />
Part 1: For convenience, <strong>in</strong>troduce the notation<br />
F = F(θ) = W 1/2 Γα(θ) , (A.68)<br />
Π ⊥ = Π ⊥<br />
W 1/2 Γα (θ) = Π⊥ F = I − F(F ∗ F) −1 F ∗ . (A.69)<br />
The cost-function considered, (5.35), now reads<br />
V (θ) = vec T (Π ⊥ ˆ Qs)Λ vec(Π ⊥ ˆ Qs) . (A.70)<br />
S<strong>in</strong>ce ˆ θ m<strong>in</strong>imizes V (θ) a first order Taylor expansion of the gradient,<br />
V ′ (θ), around the true value, θ0, leads to<br />
( ˆ θ − θ0) = −[ ¯ V ′′ ] −1 V ′ (θ0) + o(V ′ (θ0)) (A.71)
202 A Proofs <strong>and</strong> Derivations<br />
where the limit<strong>in</strong>g Hessian<br />
¯V ′′ = lim<br />
N→∞ V ′′ (θ0) (A.72)<br />
is assumed to be <strong>in</strong>vertible 2 . As shown below, √ NV ′ (θ) is asymptotically<br />
Gaussian. Thus, the normalized parameter estimation error is also<br />
asymptotically Gaussian distributed,<br />
√ N( ˆ θ − θ0) ∈ AsN(0,Ξ),<br />
where<br />
<strong>and</strong> where Υ is<br />
Ξ = [ ¯ V ′′ ] −1 Υ[ ¯ V ′′ ] −1 ,<br />
Υ = lim<br />
N→∞ N E� [V ′ (θ0)][V ′ (θ0)] T� . (A.73)<br />
Next we derive expressions for the matrices ¯ V ′′ <strong>and</strong> Υ. In the follow<strong>in</strong>g<br />
let a subscript i denote partial derivative with respect to θi. The partial<br />
derivative of (A.69) is given by<br />
This gives for (A.70)<br />
Π ⊥ i = −Π ⊥ FiF † − F ∗†<br />
F ∗ iΠ ⊥ . (A.74)<br />
Vi = −2vec T (Π ⊥ FiF † ˆ Qs + F †∗ F ∗ iΠ ⊥ ˆ Qs)Λ vec(Π ⊥ ˆ Qs)<br />
≃ −2vec T (Π ⊥ FiF † Qs)Λ vec(Π ⊥ ˆ Qs),<br />
where ≃ denotes first order approximation (around θ0). Next, a result<br />
from [OV94] will be used, modified to the situation <strong>in</strong> this chapter. The<br />
elements of √ N vec(Π ⊥ Qs) ˆ have a limit<strong>in</strong>g zero-mean Gaussian distribution<br />
with asymptotic covariance<br />
�<br />
Σ = lim N E vec(Π<br />
N→∞ ⊥ Qs)vec ˆ T (Π ⊥ �<br />
Qs) ˆ<br />
= � � T<br />
H Rζ(τ)H � ⊗ � Π ⊥ W 1/2 Rnα (τ)W1/2Π ⊥� (A.75)<br />
,<br />
|τ|
A.12 Proof of Theorem 5.3 203<br />
is s<strong>in</strong>gular with a null space of dimension n 2 (the null space of In ⊗Π ⊥ )<br />
<strong>and</strong> it has a rank equal to n(αl − n) (the dimension of the space onto<br />
which In ⊗ Π ⊥ projects) 3 . Insert<strong>in</strong>g this <strong>in</strong> (A.73), the (i,j)th element<br />
of Υ is given by<br />
Υ (ij) = lim<br />
N→∞ N E {Vi Vj}<br />
= 4vec T (Π ⊥ FiF † Qs)ΛΣΛvec(Π ⊥ FjF † Qs) .<br />
S<strong>in</strong>ce the second partial derivative of (A.70) reads<br />
Vij = 2vec T (Π ⊥ ij ˆ Qs)Λ vec(Π ⊥ ˆ Qs) + 2vec T (Π ⊥ i ˆ Qs)Λ vec(Π ⊥ j ˆ Qs)<br />
<strong>and</strong> that ˆ Qs → Qs w.p.1 4 , the (i,j)th element of the limit<strong>in</strong>g Hessian<br />
(A.72) is given by<br />
¯V ′′<br />
(ij) = lim<br />
N→∞ Vij = 2vec T (Π ⊥ i Qs)Λ vec(Π ⊥ j Qs) =<br />
= 2vec T (Π ⊥ FiF † Qs)Λ vec(Π ⊥ FjF † Qs) .<br />
Introduce a matrix G whose ith column is given by<br />
G•i = vec(Π ⊥ FiF † Qs) = vec � Π ⊥<br />
W1/2ΓαW1/2Γαi (W1/2Γα) † �<br />
Qs ,<br />
where all expressions should be evaluated <strong>in</strong> θ0. Us<strong>in</strong>g the identity<br />
vec(ABC) = (C T ⊗ A)vec(B) (A.76)<br />
for matrices with compatible dimensions, G can be written<br />
G = ��<br />
(W 1/2 Γα) † �T Qs ⊗ Π ⊥<br />
W1/2ΓαW1/2� × � vec(Γα1 ) vec(Γα2 ) ... vec(Γαd )� , (A.77)<br />
3 We have no proof of the exact rank of Σ. In other words, we do know that the<br />
rank is no more than n(αl − n) but we can not state (useful) conditions on the data<br />
<strong>and</strong> on the system for the rank to be equal to n(αl − n). However <strong>in</strong> all numerical<br />
examples we have performed, the rank was n(αl − n).<br />
4 It should be noted that possible non-uniqueness of the SVD components may<br />
affect “local” results <strong>in</strong> this chapter, however, <strong>in</strong> all f<strong>in</strong>al results only unique parts<br />
appear. Qs is for example not unique if several s<strong>in</strong>gular values <strong>in</strong> Ss are equal.
204 A Proofs <strong>and</strong> Derivations<br />
which is (5.47). Remember that d is the number of parameters <strong>in</strong> θ. If<br />
Γα(θ) is properly parameterized the matrix G will have full rank. With<br />
this notation, the matrices Υ <strong>and</strong> ¯ V ′′ may be written <strong>in</strong> the follow<strong>in</strong>g<br />
compact form<br />
Υ = 4G T ΛΣΛG<br />
¯V ′′ = 2G T ΛG .<br />
The asymptotic covariance matrix is thus given by<br />
Ξ = [ ¯ V ′′ ] −1 Υ[ ¯ V ′′ ] −1 = (G T ΛG) −1 (G T ΛΣΛG)(G T ΛG) −1<br />
which is (5.51).<br />
Part 2: In this part we show that the covariance matrix (5.51) as<br />
a function of the weight<strong>in</strong>g matrix Λ has a lower bound, <strong>and</strong> that this<br />
bound is atta<strong>in</strong>ed for the choice made <strong>in</strong> (5.52). The rank of Σ is n(αl−n)<br />
which is equal to the rank of I ⊗ Π ⊥<br />
W1/2 . Study<strong>in</strong>g the expression for<br />
Γα<br />
Σ <strong>in</strong> (A.75), it is seen that the range space of Σ is given by the range<br />
space of I ⊗ Π ⊥<br />
W1/2 . Similarly, it can be seen from (A.77) that the<br />
Γα<br />
range space of G lies <strong>in</strong> the range space of I ⊗Π ⊥<br />
, i.e., <strong>in</strong> the range<br />
W1/2Γα space of Σ. Hence, the projector on the range space of Σ has no effect<br />
when operat<strong>in</strong>g on G, i.e., ΣΣ † G = G.<br />
Next we show that GTΣ † G is positive def<strong>in</strong>ite. This can be proven<br />
by us<strong>in</strong>g the orthogonal decomposition of the Euclidean space<br />
R nαl = R(Σ) ⊕ N(Σ T ) ,<br />
where N(·) denotes the null space of (·) <strong>and</strong> ⊕ denotes direct sum; see<br />
e.g. [SS89]. S<strong>in</strong>ce, for any x, Gx is <strong>in</strong> R(Σ) it can not be <strong>in</strong> N(Σ T ) =<br />
N(Σ) = N(Σ † ) <strong>and</strong> thus G T Σ † G > 0.<br />
Study the follow<strong>in</strong>g matrix<br />
� T T G ΛΣΛG G ΛG<br />
GTΛG GTΣ † �<br />
=<br />
G<br />
=<br />
� �<br />
T T †<br />
G ΛΣΛG G ΛΣΣ G<br />
G T Σ † ΣΛG G T Σ † ΣΣ † G<br />
� G T Λ<br />
G T Σ †<br />
�<br />
Σ � ΛG Σ † G � ≥ 0<br />
(A.78)<br />
where the last <strong>in</strong>equality follows s<strong>in</strong>ce Σ ≥ 0. The matrix <strong>in</strong>equality<br />
should be <strong>in</strong>terpreted as that the matrix is positive semi-def<strong>in</strong>ite. S<strong>in</strong>ce
A.12 Proof of Theorem 5.3 205<br />
the matrix (A.78) is positive semi-def<strong>in</strong>ite, the same holds for the Schur<br />
complement to G T Σ † G [SS89], i.e.,<br />
G T ΛΣΛG − G T ΛG(G T Σ † G) −1 G T ΛG ≥ 0 . (A.79)<br />
The matrix G T ΛG was <strong>in</strong> part one of this appendix shown to be the<br />
limit<strong>in</strong>g Hessian of the cost-function (5.34). We have previously assumed<br />
it to be <strong>in</strong>vertible for the analysis to hold. This sets some constra<strong>in</strong>t on<br />
the choice of the weight<strong>in</strong>g Λ but is needed for parameter identifiability.<br />
Multiply<strong>in</strong>g (A.79) from left <strong>and</strong> right with (G T ΛG) −1 gives the desired<br />
result<br />
(G T Σ † G) −1 ≤ (G T ΛG) −1 (G T ΛΣΛG)(G T ΛG) −1 .<br />
Thus, the weight<strong>in</strong>g matrix <strong>in</strong> (5.52) gives a lower bound on the asymptotic<br />
covariance matrix.<br />
Part 3: In this part the expression (5.53) is derived. The first<br />
equality <strong>in</strong> (5.53) follows trivially from (5.51) <strong>and</strong> (5.52) s<strong>in</strong>ce Σ † ΣΣ † =<br />
Σ † . The rema<strong>in</strong>der of this appendix proves the second equality <strong>in</strong> (5.53)<br />
(<strong>and</strong> thus Corollary 5.4).<br />
First, note that<br />
(W 1/2 Γα) † Qs = (QsT −1 ) † Qs = TQ T s Qs = T , (A.80)<br />
where T is the similarity transformation matrix describ<strong>in</strong>g the fact that<br />
the limit<strong>in</strong>g estimate of the observability matrix may correspond to a<br />
different realization than the one correspond<strong>in</strong>g to Γα. Remember that<br />
Γα is the parameterized extended observability matrix of some realization.<br />
Here, ˆ Qs → Qs = W 1/2 ΓαT as N tends to <strong>in</strong>f<strong>in</strong>ity. The matrix T<br />
depends on W. Let<br />
Π ⊥<br />
W1/2 = EET Γα<br />
(A.81)<br />
be an orthonormal factorization, i.e., E is αl × (αl − n) <strong>and</strong> E T E = I.<br />
Such a factorization is possible s<strong>in</strong>ce a projection matrix is symmetric<br />
<strong>and</strong> has eigenvalues equal to one or zero. Observe that this implies that<br />
E † = E T . In the follow<strong>in</strong>g we make extensive use of the follow<strong>in</strong>g rules<br />
for Kronecker products<br />
(A ⊗ B)(C ⊗ D) = AC ⊗ BD ,<br />
(A ⊗ B) T = A T ⊗ B T ,<br />
(A ⊗ B) † = A † ⊗ B † .
206 A Proofs <strong>and</strong> Derivations<br />
Rewrite (5.48) as<br />
Σ = (In ⊗ E)<br />
⎛<br />
× ⎝ �<br />
(H T Rζ(τ)H) ⊗ E T W 1/2 Rnα (τ)W1/2 ⎞<br />
E⎠<br />
|τ|
A.13 Proof of Theorem 5.5 207<br />
where (A.80), (A.66) <strong>and</strong> Lemma A.1 have been used. Hence, HT −1<br />
is equal to Ωθ0 def<strong>in</strong>ed <strong>in</strong> (5.49). F<strong>in</strong>ally, the factor W 1/2 E(W 1/2 E) †<br />
<strong>in</strong> (A.84) is a projection matrix on R[W 1/2 E]. Now, the columns <strong>in</strong> E<br />
form a basis for the null-space of Γ ∗ αW 1/2 , denoted as N[Γ ∗ αW 1/2 ]. This<br />
implies that the columns <strong>in</strong> W 1/2 E form a basis for N[Γ ∗ α]. Thus, a projector<br />
project<strong>in</strong>g on R[W 1/2 E], <strong>in</strong> fact projects on N[Γ ∗ α]. To conclude,<br />
the projector W 1/2 E(W 1/2 E) † can be written as<br />
Π ⊥ Γα = I − Γα (Γ ∗ αΓα) −1 Γ ∗ α<br />
<strong>and</strong> the expression (A.83) simplifies to (5.53) which was set to prove.<br />
A.13 Proof of Theorem 5.5<br />
In this appendix it is shown that (5.55) <strong>and</strong> (5.56) are asymptotically<br />
equivalent estimators. Let<br />
ˆR = W 1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 PβΠ ⊥ αY T αW 1/2<br />
= ˆ Qs ˆ S 2 s ˆ Q T s + ˆ Qn ˆ S 2 n ˆ Q T n<br />
<strong>and</strong> let F be as <strong>in</strong> (A.68). Also recall the expression (A.69) <strong>and</strong> the<br />
correspond<strong>in</strong>g derivative (A.74). Introduce the two cost-functions<br />
�<br />
V (θ) = Tr Π ⊥ F ˆ �<br />
R , (A.85)<br />
�<br />
J(θ) = Tr Π ⊥ F ˆ QsS 2 s ˆ Q T �<br />
s . (A.86)<br />
Observe that ˆ R is real <strong>and</strong> thus there exists real s<strong>in</strong>gular vectors, F may<br />
on the other h<strong>and</strong> be complex-valued. S<strong>in</strong>ce the m<strong>in</strong>imizers of (A.85) <strong>and</strong><br />
(A.86) are consistent estimates of the true parameters it suffices, under<br />
mild conditions, to show that [Lju87, SS89]<br />
Vi = Ji + op(1/ √ N) , (A.87)<br />
Vij = Jij + op(1) , (A.88)<br />
where Vi denotes the partial derivative with respect to θi <strong>and</strong> similarly<br />
Vij denotes the second partial derivative (cf. (A.71)). The notation<br />
op(·) means the probability version of the correspond<strong>in</strong>g determ<strong>in</strong>istic<br />
notation 5 . Perturbation theory of subspaces gives<br />
Π ⊥ F ˆ RQs = Π ⊥ F ˆ QsS 2 s + op(1/ √ N) ;<br />
5 A sequence, xN, of r<strong>and</strong>om variables is op(aN) if the probability limit of a −1<br />
N xN<br />
is zero.
208 A Proofs <strong>and</strong> Derivations<br />
see Section 2.3.4 or [CTO89]. The follow<strong>in</strong>g holds for the limit<strong>in</strong>g pr<strong>in</strong>cipal<br />
s<strong>in</strong>gular vectors<br />
QsT = F<br />
for some full rank T. The scene is now settled to verify (A.87) <strong>and</strong> (A.88)<br />
by straightforward calculations. Consider (A.87)<br />
�<br />
Vi = Tr Π ⊥ Fi ˆ �<br />
R<br />
�<br />
= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 F ∗ �<br />
Rˆ<br />
�<br />
= −2 Re Tr Fi(F ∗ F) −1 T ∗ Q ∗ s ˆ RΠ ⊥ �<br />
F<br />
�<br />
= −2 Re Tr Fi(F ∗ F) −1 T ∗ S 2 s ˆ Q T s Π ⊥ �<br />
F + op(1/ √ N) ,<br />
�<br />
Ji = Tr Π ⊥ Fi ˆ QsS 2 s ˆ Q T �<br />
s<br />
�<br />
= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 F ∗ QsS ˆ 2 s ˆ Q T �<br />
s<br />
�<br />
= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 (QsT) ∗ QsS 2 s ˆ Q T �<br />
s + op(1/ √ N)<br />
�<br />
= −2 Re Tr Fi(F ∗ F) −1 T ∗ S 2 s ˆ Q T s Π ⊥ �<br />
F + op(1/ √ N) .<br />
Hence, (A.87) holds true. Now turn to (A.88). S<strong>in</strong>ce<br />
the result is immediate.<br />
ˆR = R + op(1) = QsS 2 sQ T s + op(1) ,<br />
A.14 Rank of Coefficient Matrix<br />
Lemma A.2. Let A be an n × n matrix with all eigenvalues <strong>in</strong>side the<br />
unit circle <strong>and</strong> let B be an n × m matrix. Def<strong>in</strong>e the matrix polynomial<br />
F u (q −1 ) as <strong>in</strong> (6.17). The coefficient matrix is, then;<br />
�<br />
u Fn ... Fu �<br />
1 ,<br />
<strong>and</strong> it has the same rank as the reachability matrix of (A,B).
A.15 Rank of Sylvester Matrix 209<br />
Proof. The result follows from a simple construction. Let a(q −1 ) be as<br />
<strong>in</strong> (6.18); then,<br />
F u (q −1 ) =<br />
Adj{qI − A}B<br />
q n<br />
= a(q −1 ) � qI − A � −1 B<br />
Comparison of coefficients leads to<br />
= (a0 + a1q −1 + · · · + anq −n )<br />
× q −1 (I + q −1 A + q −2 A 2 + ...)B.<br />
�<br />
u Fn ... Fu � � �<br />
n−1 n−2<br />
1 = A B A B ... B<br />
⎡<br />
⎢<br />
× ⎢<br />
⎣<br />
a0Im 0 0 ... 0<br />
a1Im a0Im 0<br />
.<br />
. ..<br />
an−1Im an−2Im ... a0Im<br />
where Im is a m × m identity matrix. The last factor is clearly nons<strong>in</strong>gular<br />
(s<strong>in</strong>ce a0 = 1), so the lemma is proven.<br />
The lemma shows that the first block row of S as def<strong>in</strong>ed <strong>in</strong> (6.19) has<br />
full row rank if (A, [B K]) is reachable. This implies that S has full row<br />
rank (s<strong>in</strong>ce a0 = 1).<br />
A.15 Rank of Sylvester Matrix<br />
Consider the Sylvester matrix Sy <strong>in</strong> (6.31) generalized to the multi-<strong>in</strong>put<br />
case. This means that every ai is replaced by aiIm <strong>and</strong> that every F y<br />
i is<br />
now a matrix of dimension l × m. The dimension of Sy becomes ((α +<br />
β)m+βl)×((α+β+n)m). Observe that the relation (6.30) holds. Def<strong>in</strong>e<br />
⎡ ⎤<br />
z = ⎣<br />
y d β (t)<br />
uβ(t)<br />
uα(t + β)<br />
<strong>and</strong> express z <strong>in</strong> the follow<strong>in</strong>g two equivalent ways:<br />
⎦<br />
1<br />
z = Sy<br />
a(q−1 ) uα+β+n(t − n)<br />
.<br />
⎤<br />
⎥<br />
⎦ ,
210 A Proofs <strong>and</strong> Derivations<br />
<strong>and</strong><br />
⎡<br />
Γβ<br />
z = ⎣ 0<br />
Φβ<br />
Iβm<br />
⎤ ⎡<br />
0<br />
0 ⎦ ⎣<br />
0 0 Iαm<br />
xd (t)<br />
uβ(t)<br />
⎤ ⎡<br />
⎦ � Ξ⎣<br />
uα(t + β)<br />
xd (t)<br />
uβ(t)<br />
⎤<br />
⎦ .<br />
uα(t + β)<br />
Next, study<br />
Ē{zz T } = SyĒ �<br />
1<br />
a(q−1 ) uα+β+n(t<br />
1<br />
− n)<br />
a(q−1 ) uT �<br />
α+β+n(t − n)<br />
(A.89)<br />
= ΞĒ ⎧⎡<br />
⎪⎨<br />
⎣<br />
⎪⎩<br />
xd ⎤ ⎡<br />
(t)<br />
uβ(t) ⎦ ⎣<br />
uα(t + β)<br />
xd ⎤T<br />
(t)<br />
uβ(t) ⎦<br />
uα(t + β)<br />
⎫ ⎪⎬ ⎪⎭ ΞT . (A.90)<br />
From the proof of Theorem 6.2 we know that the covariance matrix <strong>in</strong><br />
the middle of (A.90) is positive def<strong>in</strong>ite if u(t) is at least PE(α + β + n)<br />
<strong>and</strong> if (A,B) is reachable. If rank{Γβ} = n, then Ξ is full column rank<br />
<strong>and</strong> it follows that<br />
rank � Ē{zz T } � = n + (α + β)m. (A.91)<br />
Furthermore, if u(t) is PE(α + β + n) then<br />
�<br />
1<br />
Ē<br />
a(q−1 ) uα+β+n(t<br />
1<br />
− n)<br />
a(q−1 ) uT �<br />
α+β+n(t − n)<br />
is positive def<strong>in</strong>ite [SS83]. Notice that we assume that the system is<br />
stable. This assumption implies that a(z) has all roots outside the unit<br />
circle. Compar<strong>in</strong>g (A.91) <strong>and</strong> (A.89), it can be concluded that,<br />
rank{Sy} = n + (α + β)m.<br />
This result holds if (A,B) is reachable, if (A,C) is observable, <strong>and</strong> if β<br />
is greater than or equal to the observability <strong>in</strong>dex. The observability <strong>and</strong><br />
reachability assumptions are equivalent to requir<strong>in</strong>g that the polynomials<br />
F y (z) <strong>and</strong> a(z) do not share any common zeros.<br />
S T y
Appendix B<br />
Notations<br />
In general, boldface uppercase letters are used for matrices <strong>and</strong> boldface<br />
lowercase letters denote vectors. Scalars are set <strong>in</strong> normal typeface.<br />
R, C The set of real <strong>and</strong> complex numbers, respectively.<br />
m × n The dimension of a matrix with m rows <strong>and</strong> n columns.<br />
X ∈ R m×n X is an m × n matrix of real numbers.<br />
X ∈ C m×n X is an m × n matrix of complex numbers.<br />
Aij<br />
θk<br />
A•k<br />
Ak•<br />
The (i,j)th element of the matrix A.<br />
The kth element of the vector θ.<br />
The kth column of A.<br />
The kth row of A.<br />
A T ,A ∗ ,A c The transpose, complex conjugate transpose <strong>and</strong> complex<br />
conjugate of the matrix A, respectively.<br />
A −T The transpose of the <strong>in</strong>verse of A. That is, A −T =<br />
(A −1 ) T .<br />
A † The Moore-Penrose pseudo <strong>in</strong>verse of A.<br />
Tr{A} The trace operator [Gra81].<br />
�x�2<br />
The 2-norm of the vector x (�x� 2 2 = x ∗ x).
212 B Notations<br />
�A�F<br />
The Frobenius norm of the matrix A.<br />
vec{A} The vectorization operator [Gra81].<br />
⊗ Kronecker product [Gra81, Bre78].<br />
⊙ The Hadamard or Schur product (elementwize multiplication).<br />
◦ The Khatri-Rao product (columnwize Kronecker product)<br />
[Bre78].<br />
In<br />
0m×n<br />
The n × n identity matrix. The subscript is often omitted.<br />
An m × n matrix of zeros. The subscript is typically<br />
omitted.<br />
diag{x} A diagonal matrix with the elements of the vector x on<br />
the diagonal.<br />
Re{A},Im{A} The real <strong>and</strong> imag<strong>in</strong>ary part of A, respectively.<br />
A > 0 (A ≥ 0) The Hermitian matrix A = A ∗ is positive def<strong>in</strong>ite (positive<br />
semidef<strong>in</strong>ite). For two Hermitian matrices A <strong>and</strong><br />
B, the notation A > B (A ≥ B) means that A − B is<br />
positive def<strong>in</strong>ite (positive semidef<strong>in</strong>ite).<br />
rank(A) The rank of the matrix A.<br />
N {A} The null-space of the matrix A.<br />
R{A} The range-space of A.<br />
⊕ Direct orthogonal sum of vector spaces.<br />
arg m<strong>in</strong><br />
x f(x) The value of x that m<strong>in</strong>imizes f(x).<br />
ˆθ An estimate of the quantity θ.<br />
O(x) An arbitrary function such that |O(x)/x| ≤ c < ∞ as<br />
x → 0.<br />
o(x) An arbitrary function such that |o(x)/x| → 0 as x → 0.
B Notations 213<br />
Op(x), op(x) The <strong>in</strong>-probability versions of O(x) <strong>and</strong> o(x), respectively<br />
[MKB79].<br />
δi,j<br />
Kronecker’s delta function (δi,j = 1 if i = j <strong>and</strong> zero<br />
otherwise.)<br />
⌊k⌋ The largest <strong>in</strong>teger less than or equal to k ∈ R.<br />
s ∈ (r,t] Half-open <strong>in</strong>terval, r < s ≤ t.<br />
q, q −1 The forward <strong>and</strong> backward unit shift operators, respectively<br />
(qu(t) = u(t + 1) <strong>and</strong> q −1 u(t) = u(t − 1)).<br />
log{·} The natural logarithm.<br />
det{·} The determ<strong>in</strong>ant function.<br />
E{·} Mathematical expectation.<br />
Ē{x(t)} limN→∞ 1<br />
N<br />
� N<br />
t=1 E{x(t)}<br />
x ∈ N(m,P) The vector x is (complex) Gaussian distributed with<br />
mean m <strong>and</strong> covariance P.<br />
x ∈ AsN(m,P) The vector x is asymptotically (complex) Gaussian distributed<br />
with mean m <strong>and</strong> covariance P.
Appendix C<br />
Abbreviations <strong>and</strong><br />
Acronyms<br />
4SID State-space subspace system identification<br />
ABC Asymptotically best consistent<br />
AR Autoregressive<br />
ARMA Autoregressive mov<strong>in</strong>g average<br />
ARX Autoregressive model with exogenous <strong>in</strong>puts<br />
CRB Cramér Rao lower bound<br />
CVA Canonical variate analysis<br />
DEUCE Direction estimator for uncorrelated emitters<br />
DOA Direction of arrival<br />
ESPRIT Estimation of signal parameters by rotational <strong>in</strong>variance<br />
techniques<br />
EVD Eigenvalue decomposition<br />
FB Forward backward<br />
FS F<strong>in</strong>ite samples
216 C Abbreviations <strong>and</strong> Acronyms<br />
GWSF Generalized weighted subspace fitt<strong>in</strong>g<br />
IV Instrumental variable<br />
ME Model errors<br />
ML Maximum likelihood<br />
MLE Maximum likelihood estimate<br />
MODE Method of direction estimation<br />
MOESP Multi-<strong>in</strong>put multi-output output error state-space model<br />
realization<br />
MUSIC Multiple signal classification<br />
N4SID Numerical algorithms for subspace state space system identification<br />
PEM Prediction error method<br />
SNR Signal to noise ratio<br />
SVD S<strong>in</strong>gular value decomposition<br />
ULA Uniform l<strong>in</strong>ear array<br />
w.p.1 With probability one<br />
WSF Weighted subspace fitt<strong>in</strong>g
Bibliography<br />
[AJ76] B. D. O. Anderson <strong>and</strong> E. I. Jury. “Generalized Bezoutian<br />
<strong>and</strong> Sylvester Matrices <strong>in</strong> Multivariable L<strong>in</strong>ear Control”.<br />
IEEE Transactions on Automatic Control, AC-21:551–556,<br />
Aug. 1976.<br />
[AK90] K. S. Arun <strong>and</strong> S. Y. Kung. “Balanced Approximation of<br />
Stochastic <strong>System</strong>s”. SIAM J. Matrix Analysis <strong>and</strong> Applications,<br />
11:42–68, 1990.<br />
[Aka74a] H. Akaike. “A New Look at the Statistical Model <strong>Identification</strong>”.<br />
IEEE Trans. Automat. Contr., AC-19:716–723, Dec.<br />
1974.<br />
[Aka74b] H. Akaike. “Stochastic Theory of M<strong>in</strong>imal Realization”.<br />
IEEE Trans. Automat. Contr., AC-19(6):667–674, Dec. 1974.<br />
[Aka75] H. Akaike. “Markovian Representation of Stochastic Processes<br />
by Canonical Variables”. SIAM Journal on Control,<br />
13(1):162–173, 1975.<br />
[Alb72] A. Albert. Regression <strong>and</strong> the Moore-Penrose Pseudo<strong>in</strong>verse.<br />
Academic Press, New York, 1972.<br />
[And84] T.W. Anderson. “An Introduction to Multivariate Statistical<br />
Analysis, 2 nd edition”. John Wiley & Sons, New York, 1984.<br />
[Aok87] M. Aoki. State Space Model<strong>in</strong>g of Time Series. Spr<strong>in</strong>ger<br />
Verlag, Berl<strong>in</strong>, 1987.<br />
[Ban71] W. J. Bangs. Array Process<strong>in</strong>g with Generalized Beamformers.<br />
PhD thesis, Yale University, New Haven, CT, 1971.
218 Bibliography<br />
[Bar83] A.J. Barabell. “Improv<strong>in</strong>g the Resolution Performance of<br />
Eigenstructure-Based Direction-F<strong>in</strong>d<strong>in</strong>g Algorithms”. In<br />
Proc. ICASSP 83, pages 336–339, Boston, MA, 1983.<br />
[Bay92] D. Bayard. “An Algorithm for State-space Frequency Doma<strong>in</strong><br />
<strong>Identification</strong> Without W<strong>in</strong>dow<strong>in</strong>g Distortions”. In Proc. 31st<br />
IEEE Conf. on Decision <strong>and</strong> Control, pages 1707–1712, Tucson,<br />
AZ, 1992.<br />
[BDS97] D. Bauer, M. Deistler, <strong>and</strong> W. Scherrer. “The Analysis of the<br />
Asymptotic Variance of <strong>Subspace</strong> Algorithms”. In Proc. 11th<br />
IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pages 1087–1091,<br />
Fukuoka, Japan, 1997.<br />
[Bel70] R. Bellman. Introduction to Matrix Analysis, 2 nd edition.<br />
McGraw-Hill, New York, 1970.<br />
[Bie79] G. Bienvenue. “Influence of the Spatial Coherence of the<br />
Background Noise on High Resolution Passive <strong>Methods</strong>”. In<br />
Proc. ICASSP’79, pages 306–309, Wash<strong>in</strong>gton, DC, 1979.<br />
[BKAK78] R. R. Bitmead, S. Y. Kung, B. D. O. Anderson, <strong>and</strong><br />
T. Kailath. “Greatest Common Divisors via Generalized<br />
Sylvester <strong>and</strong> Bezout Matrices”. IEEE Transactions on Automatic<br />
Control, 23(6):1043–1047, Dec. 1978.<br />
[BM86] Y. Bresler <strong>and</strong> A. Macovski. “Exact Maximum Likelihood Parameter<br />
Estimation of Superimposed Exponential Signals <strong>in</strong><br />
Noise”. IEEE Trans. ASSP, ASSP-34:1081–1089, Oct. 1986.<br />
[Böh86] J.F. Böhme. “Estimation of Spectral Parameters of Correlated<br />
Signals <strong>in</strong> Wavefields”. Signal Process<strong>in</strong>g, 10:329–337,<br />
1986.<br />
[Boh91] T. Bohl<strong>in</strong>. “Interactive <strong>System</strong> <strong>Identification</strong>: Prospects <strong>and</strong><br />
Pitfalls”. Spr<strong>in</strong>ger-Verlag, 1991.<br />
[Bre78] J. W. Brewer. “Kronecker Products <strong>and</strong> Matrix Calculus<br />
<strong>in</strong> <strong>System</strong> Theory”. IEEE Trans. on Circuits <strong>and</strong> <strong>System</strong>s,<br />
CAS-25(9):772–781, September 1978.<br />
[CK95] Y.M. Cho <strong>and</strong> T. Kailath. “Fast <strong>Subspace</strong>-based <strong>System</strong><br />
<strong>Identification</strong>: An Instrumental Variable Approach”. Automatica,<br />
31(6):903–905, 1995.
Bibliography 219<br />
[Com82] R. T. Compton. “The Effect of R<strong>and</strong>om Steer<strong>in</strong>g Vector<br />
Errors <strong>in</strong> the Applebaum Adaptive Array”. IEEE Trans. on<br />
Aero. <strong>and</strong> Elec. Sys., AES-18(5):392–400, Sept. 1982.<br />
[Cra46] H. Cramér. Mathematical <strong>Methods</strong> of Statistics. Pr<strong>in</strong>ceton<br />
University Press, 1946.<br />
[CS96] M. Cedervall <strong>and</strong> P. Stoica. “<strong>System</strong> <strong>Identification</strong> from<br />
Noisy Measurements by Us<strong>in</strong>g Instrumental Variables <strong>and</strong><br />
<strong>Subspace</strong> Fitt<strong>in</strong>g”. Circuits, <strong>System</strong>s <strong>and</strong> Signal Process<strong>in</strong>g,<br />
15(2):275–290, 1996.<br />
[CTO89] H. Clergeot, S. Tressens, <strong>and</strong> A. Ouamri. “Performance<br />
of High Resolution Frequencies Estimation <strong>Methods</strong> Compared<br />
to the Cramér-Rao Bounds”. IEEE Trans. ASSP,<br />
37(11):1703–1720, November 1989.<br />
[CXK94] Y.M. Cho, G. Xu, <strong>and</strong> T. Kailath. “Fast Recursive <strong>Identification</strong><br />
of State Space Models via Exploitation of Displacement<br />
Structure”. Automatica, Special Issue on Statistical Signal<br />
Process<strong>in</strong>g <strong>and</strong> Control, 30(1):45–59, Jan. 1994.<br />
[De 88] B. De Moor. Mathematical Concepts <strong>and</strong> Techniques for Model<strong>in</strong>g<br />
of Static <strong>and</strong> Dynamic <strong>System</strong>s. PhD thesis, Kath. Universiteit<br />
Leuven, Heverlee, Belgium, 1988.<br />
[DPS95] M. Deistler, K. Peternell, <strong>and</strong> W. Scherrer. “Consistency<br />
<strong>and</strong> Relative Efficiency of <strong>Subspace</strong> <strong>Methods</strong>”. Automatica,<br />
special issue on trends <strong>in</strong> system identification, 31(12):1865–<br />
1875, Dec. 1995.<br />
[DVVM88] B. De Moor, J. V<strong>and</strong>ewalle, L. V<strong>and</strong>enberghe, <strong>and</strong> P. Van<br />
Mieghem. “A Geometrical Strategy for the <strong>Identification</strong> of<br />
State Space Models of L<strong>in</strong>ear Multivariable <strong>System</strong>s with S<strong>in</strong>gular<br />
Value Decomposition”. In Proc. IFAC 88, pages 700–<br />
704, Beij<strong>in</strong>g, Ch<strong>in</strong>a, 1988.<br />
[Eyk74] P. Eykhoff. “<strong>System</strong> <strong>Identification</strong>, Parameter <strong>and</strong> State Estimation”.<br />
Wiley, 1974.<br />
[Fau76] P.L. Faurre. Stochastic realization algorithms. In R.K. Mehra<br />
<strong>and</strong> D.G. La<strong>in</strong>iotis, editors, <strong>System</strong> <strong>Identification</strong>: Advances<br />
<strong>and</strong> Case Studies. Academic Press, NY, 1976.
220 Bibliography<br />
[FFLC95] A. Flieller, A. Ferreol, P. Larzabal, <strong>and</strong> H. Clergeot. “Robust<br />
Bear<strong>in</strong>g Estimation <strong>in</strong> the Presence of Direction-Dependent<br />
Modell<strong>in</strong>g Errors: Identifiability <strong>and</strong> Treatment”. In Proc.<br />
ICASSP, pages 1884–1887, Detroit, MI, 1995.<br />
[Fri90a] B. Friedl<strong>and</strong>er. “A Sensitivity Analysis of the MUSIC Algorithm”.<br />
IEEE Trans. on ASSP, 38(10):1740–1751, October<br />
1990.<br />
[Fri90b] B. Friedl<strong>and</strong>er. “Sensitivity of the Maximum Likelihood<br />
Direction F<strong>in</strong>d<strong>in</strong>g Algorithm”. IEEE Trans. on AES,<br />
26(6):953–968, November 1990.<br />
[Fuc92] J.J. Fuchs. “Estimation of the Number of Signals <strong>in</strong> the<br />
Presence of Unknown Correlated <strong>Sensor</strong> Noise”. IEEE Trans.<br />
on Signal Process<strong>in</strong>g, SP-40(5):1053–1061, May 1992.<br />
[FW94] B. Friedl<strong>and</strong>er <strong>and</strong> A.J. Weiss. “Effects of Model Errors on<br />
Waveform Estimation Us<strong>in</strong>g the MUSIC Algorithm”. IEEE<br />
Trans. onSP, SP-42(1):147–155, Jan. 1994.<br />
[GJO96] B. Göransson, M. Jansson, <strong>and</strong> B. Ottersten. “Spatial <strong>and</strong><br />
Temporal Frequency Estimation of Uncorrelated Signals Us<strong>in</strong>g<br />
<strong>Subspace</strong> Fitt<strong>in</strong>g”. In Proceed<strong>in</strong>gs of 8th IEEE Signal<br />
Process<strong>in</strong>g Workshop on Statistical Signal <strong>and</strong> Array Process<strong>in</strong>g,<br />
pages 94–96, Corfu, Greece, June 1996.<br />
[GO97] B. Göransson <strong>and</strong> B. Ottersten. “Efficient Direction Estimation<br />
of Uncorrelated Sources Us<strong>in</strong>g Polynomial Root<strong>in</strong>g”.<br />
In Proc. 13th Intl. Conf. DSP’97, pages 935–938, Santor<strong>in</strong>i,<br />
Greece, 1997.<br />
[Goo63] N.R. Goodman. “Statistical Analysis Based on a Certa<strong>in</strong><br />
Multivariate Complex Gaussian Distribution (An Introduction)”.<br />
Annals Math. Stat., Vol. 34:152–176, 1963.<br />
[Gop69] B. Gop<strong>in</strong>ath. “<strong>On</strong> the <strong>Identification</strong> of L<strong>in</strong>ear Time-Invariant<br />
<strong>System</strong>s from Input-Output Data”. The Bell <strong>System</strong> Technical<br />
Journal, Vol. 48(5):1101–1113, 1969.<br />
[Gor97] A. Gorokhov. “Bl<strong>in</strong>d Separation of Convolutive Mixtures:<br />
Second Order <strong>Methods</strong>”. PhD thesis, Ecole Nationale Superieure<br />
des Telecommunications (ENST), Paris, 1997.
Bibliography 221<br />
[GP77] G. C. Goodw<strong>in</strong> <strong>and</strong> R. L. Payne. “Dynamic <strong>System</strong> <strong>Identification</strong>:<br />
Experiment Design <strong>and</strong> Data Analysis”, volume 136<br />
of Mathematics <strong>in</strong> Science <strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g Series. Academic<br />
Press, 1977.<br />
[Gra81] A. Graham. Kronecker Products <strong>and</strong> Matrix Calculus with<br />
Applications. Ellis Horwood Ltd, Chichester, UK, 1981.<br />
[Gus97] T. Gustafsson. “<strong>System</strong> <strong>Identification</strong> Us<strong>in</strong>g <strong>Subspace</strong>-based<br />
Instrumental Variable <strong>Methods</strong>”. In Proc. 11th IFAC Symposium<br />
on <strong>System</strong> <strong>Identification</strong>, pages 1119–1124, Fukuoka,<br />
Japan, 1997.<br />
[GV96] G.H. Golub <strong>and</strong> C.F. Van Loan. Matrix Computations, third<br />
edition. Johns Hopk<strong>in</strong>s University Press, Baltimore, MD.,<br />
1996.<br />
[GW74] K. Glover <strong>and</strong> J. C. Willems. “Parameterizations of L<strong>in</strong>ear<br />
Dynamical <strong>System</strong>s: Canonical Forms <strong>and</strong> Identifiability”.<br />
IEEE Trans. on Automatic Control, AC-19(6):640–646, Dec.<br />
1974.<br />
[GW84] M. Gevers <strong>and</strong> V. Wertz. “Uniquely Identifiable State-space<br />
<strong>and</strong> ARMA Parameterizations for Multivariable L<strong>in</strong>ear <strong>System</strong>s”.<br />
Automatica, 20(3):333–347, 1984.<br />
[Haa97a] M. Haardt. “Efficient <strong>On</strong>e-, Two-, <strong>and</strong> Multidimensional<br />
High-Resolution Array Signal Process<strong>in</strong>g”. PhD thesis, Technical<br />
University of Munich, Shaker Verlag, 1997.<br />
[Haa97b] M. Haardt. “Structured Least Squares to Improve the Performance<br />
of ESPRIT-Type Algorithms”. IEEE Trans. Signal<br />
Process<strong>in</strong>g, 45(3):792–799, Mar. 1997.<br />
[Hay91] S. Hayk<strong>in</strong>. Advances <strong>in</strong> Spectrum Analysis <strong>and</strong> Array Process<strong>in</strong>g,<br />
edited by S. Hayk<strong>in</strong>, volume 2. Prentice-Hall, Englewood<br />
Cliffs, NJ, 1991.<br />
[HD88] E. J. Hannan <strong>and</strong> M. Deistler. The Statistical Theory of<br />
L<strong>in</strong>ear <strong>System</strong>s. J. Wiley & Sons, N. Y., 1988.<br />
[HK66] B.L. Ho <strong>and</strong> R.E. Kalman. “Efficient Construction of L<strong>in</strong>ear<br />
State Variable Models from Input/Output Functions”.<br />
Regelungstechnik, 14:545–548, 1966.
222 Bibliography<br />
[Hot33] H. Hotell<strong>in</strong>g. “Analysis of a Complex of Statistical Variables<br />
<strong>in</strong>to Pr<strong>in</strong>cipal Components”. J. Educ. Psych., 24:417–<br />
441,498–520, 1933.<br />
[Jaf88] A. G. Jaffer. “Maximum Likelihood Direction F<strong>in</strong>d<strong>in</strong>g of<br />
Stochastic Sources: A Separable Solution”. In Proc. ICASSP<br />
88, volume 5, pages 2893–2896, New York, April 1988.<br />
[Jan95] M. Jansson. “<strong>On</strong> Performance Analysis of <strong>Subspace</strong> <strong>Methods</strong><br />
<strong>in</strong> <strong>System</strong> <strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”. Technical<br />
Report TRITA-REG-9503, ISSN 0347-1071, Automatic<br />
Control, Signals, <strong>Sensor</strong>s & <strong>System</strong>s, <strong>KTH</strong>, Stockholm, June<br />
1995. Licentiate Thesis.<br />
[JD93] D.H. Johnson <strong>and</strong> D.E. Dudgeon. Array Signal Process<strong>in</strong>g<br />
– Concepts <strong>and</strong> Techniques. Prentice-Hall, Englewood Cliffs,<br />
NJ, 1993.<br />
[JGO97a] M. Jansson, B. Göransson, <strong>and</strong> B. Ottersten. “A <strong>Subspace</strong><br />
Method for Direction of Arrival Estimation of Uncorrelated<br />
Emitter Signals”. Submitted to IEEE Trans. on Signal Process<strong>in</strong>g,<br />
Aug. 1997.<br />
[JGO97b] M. Jansson, B. Göransson, <strong>and</strong> B. Ottersten. “Analysis of a<br />
<strong>Subspace</strong>-Based Spatial Frequency Estimator”. In Proceed<strong>in</strong>gs<br />
of ICASSP’97, pages 4001–4004, München, Germany,<br />
April 1997.<br />
[JO95] M. Jansson <strong>and</strong> B. Ottersten. “Covariance Preprocess<strong>in</strong>g <strong>in</strong><br />
Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”. In IEEE/IEE workshop on signal<br />
process<strong>in</strong>g methods <strong>in</strong> multipath environments, pages 23–<br />
32, Glasgow, Scotl<strong>and</strong>, 1995.<br />
[Joh93] R. Johansson. “<strong>System</strong> Model<strong>in</strong>g <strong>and</strong> <strong>Identification</strong>”. Information<br />
<strong>and</strong> <strong>System</strong> Sciences Series. Prentice Hall, 1993.<br />
[JP85] J.N. Juang <strong>and</strong> R.S. Pappa. “An Eigensystem Realization<br />
Algorithm for Modal Parameter <strong>Identification</strong> <strong>and</strong> Model Reduction”.<br />
J. Guidance, Control <strong>and</strong> Dynamics, 8(5):620–627,<br />
1985.<br />
[JSO97] M. Jansson, A. L. Sw<strong>in</strong>dlehurst, <strong>and</strong> B. Ottersten. “Weighted<br />
<strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models”. Submitted<br />
to IEEE Trans. on Signal Process<strong>in</strong>g, Jul. 1997.
Bibliography 223<br />
[JW95] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Weight<strong>in</strong>g <strong>in</strong> State-Space<br />
<strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”. In Proc. European Control<br />
Conference, ECC’95, pages 435–440, Roma, Italy, 1995.<br />
[JW96a] M. Jansson <strong>and</strong> B. Wahlberg. “A L<strong>in</strong>ear Regression Approach<br />
to State-Space <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”. Signal<br />
Process<strong>in</strong>g (EURASIP), Special Issue on <strong>Subspace</strong> <strong>Methods</strong>,<br />
Part II: <strong>System</strong> <strong>Identification</strong>, 52(2):103–129, July 1996.<br />
[JW96b] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Consistency of <strong>Subspace</strong><br />
<strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”. In Prepr<strong>in</strong>ts of 13th IFAC<br />
World Congress, pages 181–186, San Francisco, California,<br />
1996.<br />
[JW97a] M. Jansson <strong>and</strong> B. Wahlberg. “Counterexample to General<br />
Consistency of <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”. In<br />
11th IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pages 1677–<br />
1682, Fukuoka, Japan, 1997.<br />
[JW97b] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Consistency of <strong>Subspace</strong><br />
<strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong>”. Submitted to Automatica,<br />
Jun. 1997.<br />
[KAB83] S. Y. Kung, K. S. Arun, <strong>and</strong> D. V. BhaskarRao. “State-<br />
Space S<strong>in</strong>gular Value Decomposition Based Method for the<br />
Harmonic Retrieval Problem”. J. Opt. Soc. Amer., 73:1799–<br />
1811, Dec. 1983.<br />
[KB86] M. Kaveh <strong>and</strong> A. J. Barabell. “The Statistical Performance of<br />
the MUSIC <strong>and</strong> the M<strong>in</strong>imum-Norm Algorithms <strong>in</strong> Resolv<strong>in</strong>g<br />
Plane Waves <strong>in</strong> Noise”. IEEE Trans. ASSP, ASSP-34(2):331–<br />
341, April 1986. Correction <strong>in</strong> ASSP-34(3), June 1986, p. 633.<br />
[KDS88] A.M. K<strong>in</strong>g, U.B. Desai, <strong>and</strong> R.E. Skelton. “A Generalized<br />
Approach to q-Markov Covariance Equivalent Realizations<br />
of Discrete <strong>System</strong>s”. Automatica, 24(4):507–515, 1988.<br />
[KF93] M. Koerber <strong>and</strong> D. Fuhrmann. “Array Calibration by<br />
Fourier Series Parameterization: Scaled Pr<strong>in</strong>ciple Components<br />
Method”. In Proc. ICASSP, pages IV340–IV343, M<strong>in</strong>neapolis,<br />
M<strong>in</strong>n., 1993.
224 Bibliography<br />
[KP89] B. H. Kwon <strong>and</strong> S. U. Pillai. “A Self Inversive Array Process<strong>in</strong>g<br />
Scheme for Angle-of-Arrival Estimation”. Signal Process<strong>in</strong>g,<br />
17(3):259–277, Jul. 1989.<br />
[KSS94] A. Kangas, P. Stoica, <strong>and</strong> T. Söderström. “F<strong>in</strong>ite Sample <strong>and</strong><br />
Model<strong>in</strong>g Error Effects on ESPRIT <strong>and</strong> MUSIC Direction<br />
Estimators”. IEE Proc. Radar, Sonar, & Navig., 141:249–<br />
255, 1994.<br />
[KSS96] A. Kangas, P. Stoica, <strong>and</strong> T. Söderström. “Large Sample<br />
Analysis of MUSIC <strong>and</strong> M<strong>in</strong>-Norm Direction Estimators <strong>in</strong><br />
the Presence of Model Errors”. Circ., Sys., <strong>and</strong> Sig. Proc.,<br />
15:377–393, 1996.<br />
[Kun78] S.Y. Kung. “A New <strong>Identification</strong> <strong>and</strong> Model Reduction Algorithm<br />
via S<strong>in</strong>gular Value Decomposition”. In Proc. 12th<br />
Asilomar Conf. on Circuits, <strong>System</strong>s <strong>and</strong> Computers, pages<br />
705–714, Pacific Grove, CA, November 1978.<br />
[Kun81] S. Y. Kung. “A Toeplitz Approximation Method <strong>and</strong> Some<br />
Applications”. In Proc. Int. Symp. Mathematical Theory of<br />
Networks <strong>and</strong> <strong>System</strong>s, pages 262–266, Santa Monica, CA.,<br />
Aug. 1981.<br />
[KV96] H. Krim <strong>and</strong> M. Viberg. “Two Decades of Array Signal Process<strong>in</strong>g<br />
Research: The Parametric Approach”. IEEE Signal<br />
Process<strong>in</strong>g Magaz<strong>in</strong>e, 13(4):67–94, Jul. 1996.<br />
[Lar83] W. E. Larimore. “<strong>System</strong> <strong>Identification</strong>, Reduced-Order Filter<strong>in</strong>g<br />
<strong>and</strong> Model<strong>in</strong>g via Canonical Variate Analysis”. In<br />
Proc. ACC, pages 445–451, June 1983.<br />
[Lar90] W. E. Larimore. “Canonical Variate Analysis <strong>in</strong> <strong>Identification</strong>,<br />
Filter<strong>in</strong>g <strong>and</strong> Adaptive Control”. In Proc. 29th CDC,<br />
pages 596–604, Honolulu, Hawaii, December 1990.<br />
[Lar94] W. E. Larimore. “The Optimality of Canonical Variate <strong>Identification</strong><br />
by Example”. In Proc. 10th IFAC Symp. on Syst.<br />
Id., volume 2, pages 151–156, Copenhagen, Denmark, 1994.<br />
[Law40] D. N. Lawley. The estimation of factor load<strong>in</strong>gs by the<br />
method of maximum likelihood. In Proceed<strong>in</strong>gs of the Royal<br />
Society of Ed<strong>in</strong>burgh, Sec. A., volume 60, pages 64–82, 1940.
Bibliography 225<br />
[Liu92] K. Liu. “<strong>Identification</strong> of Multi-Input <strong>and</strong> Multi-Output <strong>System</strong>s<br />
by Observability Range Space Extraction”. In Proc.<br />
CDC, pages 915–920, Tucson, AZ, 1992.<br />
[Lju87] L. Ljung. <strong>System</strong> <strong>Identification</strong>: Theory for the User.<br />
Prentice-Hall, Englewood Cliffs, NJ, 1987.<br />
[Lju91] L. Ljung. “A Simple Start-up Procedure for Canonical Form<br />
State Space <strong>Identification</strong>, Based on <strong>Subspace</strong> Approximation”.<br />
In Proc. 30th IEEE Conf. on Decision <strong>and</strong> Control,<br />
pages 1333–1336, Brighton, U.K., 1991.<br />
[LM87] J. Lo <strong>and</strong> S. Marple. “Eigenstructure <strong>Methods</strong> for Array<br />
<strong>Sensor</strong> Localization”. In Proc. ICASSP, pages 2260–2263,<br />
Dallas, TX, 1987.<br />
[LS91] K. Liu <strong>and</strong> R. E. Skelton. “<strong>Identification</strong> <strong>and</strong> Control of<br />
NASA’s ACES Structure”. In Proc. American Control Conference,<br />
pages 3000–3006, Boston, MA, 1991.<br />
[LV92] F. Li <strong>and</strong> R. Vaccaro. “Sensitivity Analysis of DOA Estimation<br />
Algorithms to <strong>Sensor</strong> Errors”. IEEE Trans. AES,<br />
AES-28(3):708–717, July 1992.<br />
[McK94a] T. McKelvey. “Fully Parametrized State-Space Models <strong>in</strong><br />
<strong>System</strong> <strong>Identification</strong>”. In Proc. 10th IFAC Symp. on Syst.<br />
Id., volume 2, pages 373–378, Copenhagen, Denmark, 1994.<br />
[McK94b] T. McKelvey. <strong>On</strong> state-space models <strong>in</strong> system identification.<br />
Technical Report LiU-TEK-LIC-1994:33, L<strong>in</strong>köp<strong>in</strong>g University,<br />
May 1994. Licentiate Thesis.<br />
[MDVV89] M. Moonen, B. De Moor, L. V<strong>and</strong>enberghe, <strong>and</strong> J. V<strong>and</strong>ewalle.<br />
“<strong>On</strong>- <strong>and</strong> off-l<strong>in</strong>e identification of l<strong>in</strong>ear state-space<br />
models”. Int. J. Control, 49(1):219–232, 1989.<br />
[MKB79] K.V. Mardia, J.T. Kent, <strong>and</strong> J.M. Bibby. Multivariate Analysis.<br />
Academic Press, London, 1979.<br />
[MR94] D. McArthur <strong>and</strong> J. Reilly. “A Computationally Efficient<br />
Self-Calibrat<strong>in</strong>g Direction of Arrival Estimator”. In Proc.<br />
ICASSP, pages IV–201 – IV–205, Adelaide, Australia, 1994.
226 Bibliography<br />
[NK94] V. Nagesha <strong>and</strong> S. Kay. “<strong>On</strong> Frequency Estimation with<br />
the IQML Algorithm”. IEEE Trans. on Signal Process<strong>in</strong>g,<br />
42(9):2509–2513, Sep. 1994.<br />
[NN93] B. Ng <strong>and</strong> A. Nehorai. “Active Array <strong>Sensor</strong> Location Calibration”.<br />
In Proc. ICASSP, pages IV21–IV24, M<strong>in</strong>neapolis,<br />
M<strong>in</strong>n., 1993.<br />
[OAKP97] B. Ottersten, D. Asztély, M. Kristensson, <strong>and</strong> S. Parkvall. “A<br />
Statistical Approach to <strong>Subspace</strong> Based Estimation with Applications<br />
<strong>in</strong> Telecommunications”. In S. Van Huffel, editor,<br />
2nd International Workshop on TLS <strong>and</strong> Errors-<strong>in</strong>-Variables<br />
Model<strong>in</strong>g, pages 285–294, Leuven, Belgium, 1997. SIAM.<br />
[ORK89] B. Ottersten, R. Roy, <strong>and</strong> T. Kailath. “Signal Waveform Estimation<br />
<strong>in</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”. In Proc. 23 rd Asilomar<br />
Conf. Sig., Syst.,Comput., pages 787–791, Nov. 1989.<br />
[Ott89] B. Ottersten. Parametric <strong>Subspace</strong> Fitt<strong>in</strong>g <strong>Methods</strong> for Array<br />
Signal Process<strong>in</strong>g. PhD thesis, Stanford University, Stanford,<br />
CA, Dec. 1989.<br />
[OV94] B. Ottersten <strong>and</strong> M. Viberg. “A <strong>Subspace</strong>-Based Instrumental<br />
Variable Method for State-Space <strong>System</strong> <strong>Identification</strong>”.<br />
In Proc. 10th IFAC Symp. on Syst. Id., volume 2, pages 139–<br />
144, Copenhagen, Denmark, 1994.<br />
[OVK90] B. Ottersten, M. Viberg, <strong>and</strong> T. Kailath. “Asymptotic Robustness<br />
of <strong>Sensor</strong> Array Process<strong>in</strong>g <strong>Methods</strong>”. In Proc.<br />
ICASSP 90 Conf, pages 2635–2638, Albuquerque, NM, April<br />
1990.<br />
[OVSN93] B. Ottersten, M. Viberg, P. Stoica, <strong>and</strong> A. Nehorai. “Exact<br />
<strong>and</strong> Large Sample ML Techniques for Parameter Estimation<br />
<strong>and</strong> Detection <strong>in</strong> Array Process<strong>in</strong>g”. In Hayk<strong>in</strong>, Litva,<br />
<strong>and</strong> Shepherd, editors, Radar Array Process<strong>in</strong>g, pages 99–<br />
151. Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1993.<br />
[Pil89] S. U. Pillai. “Array Signal Process<strong>in</strong>g”. New York: Spr<strong>in</strong>ger-<br />
Verlag, 1989.<br />
[Pis73] V.F. Pisarenko. “The Retrieval of Harmonics from a Covariance<br />
Function”. Geophys. J. Roy. Astron. Soc., 33:347–366,<br />
1973.
Bibliography 227<br />
[PK85] A. Paulraj <strong>and</strong> T. Kailath. “Direction-of-Arrival Estimation<br />
by Eigenstructure <strong>Methods</strong> with Unknown <strong>Sensor</strong> Ga<strong>in</strong> <strong>and</strong><br />
Phase”. In Proc. IEEE ICASSP, pages 17.7.1–17.7.4, Tampa,<br />
Fla., March 1985.<br />
[PK89] S. U. Pillai <strong>and</strong> B. H. Kwon. “Performance Analysis of<br />
MUSIC-Type High Resolution Estimators for Direction F<strong>in</strong>d<strong>in</strong>g<br />
<strong>in</strong> Correlated <strong>and</strong> Coherent Scenes”. IEEE Trans. on<br />
ASSP, 37(8):1176–1189, Aug. 1989.<br />
[PPL90] D. Pearson, S.U. Pillai, <strong>and</strong> Y. Lee. “An Algorithm for Near-<br />
Optimal Placements of <strong>Sensor</strong> Elements”. IEEE Transactions<br />
on Information Theory, 36(6):1280–1284, Nov. 1990.<br />
[PRK86] A. Paulraj, R. Roy, <strong>and</strong> T. Kailath. “A <strong>Subspace</strong> Rotation<br />
Approach to Signal Parameter Estimation”. Proceed<strong>in</strong>gs of<br />
the IEEE, pages 1044–1045, July 1986.<br />
[PSD95] K. Peternell, W. Scherrer, <strong>and</strong> M. Deistler. “Statistical Analysis<br />
of <strong>Subspace</strong> <strong>Identification</strong> <strong>Methods</strong>”. In Proc. European<br />
Control Conference, ECC’95, pages 1342–1347, Roma, Italy,<br />
1995.<br />
[PSD96] K. Peternell, W. Scherrer, <strong>and</strong> M. Deistler. “Statistical Analysis<br />
of Novel <strong>Subspace</strong> <strong>Identification</strong> <strong>Methods</strong>”. Signal Process<strong>in</strong>g<br />
(EURASIP), Special Issue on <strong>Subspace</strong> <strong>Methods</strong>, Part<br />
II: <strong>System</strong> <strong>Identification</strong>, 52(2):161–177, July 1996.<br />
[Qua82] A. H. Quazi. “Array Beam Response <strong>in</strong> the Presence of<br />
Amplitude <strong>and</strong> Phase Fluctuations”. J. Acoust. Soc. Amer.,<br />
72(1):171–180, July 1982.<br />
[Qua84] N. Quang. “<strong>On</strong> the Uniqueness of the Maximum-Likelihood<br />
Estimate of Structured Covariance Matrices”. IEEE<br />
Trans. on Acoustics, Speech, <strong>and</strong> Signal Process<strong>in</strong>g, ASSP-<br />
32(6):1249–1251, Dec. 1984.<br />
[RA92] B.D. Rao <strong>and</strong> K.S. Arun. “Model Based Process<strong>in</strong>g of Signals:<br />
A State Space Approach”. Proceed<strong>in</strong>gs of the IEEE,<br />
80(2):283–309, Feb. 1992.<br />
[RH80] D. J. Ramsdale <strong>and</strong> R. A. Howerton. “Effect of Element<br />
Failure <strong>and</strong> R<strong>and</strong>om Errors <strong>in</strong> Amplitude <strong>and</strong> Phase on the
228 Bibliography<br />
Sidelobe Level Atta<strong>in</strong>able with a L<strong>in</strong>ear Array”. J. Acoust.<br />
Soc. Amer., 68(3):901–906, Sept. 1980.<br />
[RH89] B. D. Rao <strong>and</strong> K. V. S. Hari. “Performance Analysis of<br />
ESPRIT <strong>and</strong> TAM <strong>in</strong> Determ<strong>in</strong><strong>in</strong>g the Direction of Arrival<br />
of Plane Waves <strong>in</strong> Noise”. IEEE Trans. ASSP, ASSP-<br />
37(12):1990–1995, Dec. 1989.<br />
[RH93] B. D. Rao <strong>and</strong> K. V. S. Hari. “Weighted <strong>Subspace</strong> <strong>Methods</strong><br />
<strong>and</strong> Spatial Smooth<strong>in</strong>g: Analysis <strong>and</strong> Comparison”. IEEE<br />
Trans. ASSP, 41(2):788–803, Feb. 1993.<br />
[RK89] R. Roy <strong>and</strong> T. Kailath. “ESPRIT–Estimation of Signal<br />
Parameters Via Rotational Invariance Techniques”. IEEE<br />
Trans. ASSP, ASSP-37(7):984–995, Jul. 1989.<br />
[RPK86] R. Roy, A. Paulraj, <strong>and</strong> T. Kailath. “ESPRIT – A <strong>Subspace</strong><br />
Rotation Approach to Estimation of Parameters of Cisoids<br />
<strong>in</strong> Noise”. IEEE Trans. on ASSP, ASSP-34(4):1340–1342,<br />
October 1986.<br />
[RS87a] Y. Rockah <strong>and</strong> P. M. Schultheiss. “Array Shape Calibration<br />
Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations – Part I: Far-Field<br />
Sources”. IEEE Trans. on ASSP, 35:286–299, March 1987.<br />
[RS87b] Y. Rockah <strong>and</strong> P. M. Schultheiss. “Array Shape Calibration<br />
Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations – Part II: Near-Field<br />
Sources <strong>and</strong> Estimator Implementation”. IEEE Trans. on<br />
ASSP, 35:724–735, June 1987.<br />
[SCE95] P. Stoica, M. Cedervall, <strong>and</strong> A. Eriksson. “Comb<strong>in</strong>ed Instrumental<br />
Variable <strong>and</strong> <strong>Subspace</strong> Fitt<strong>in</strong>g Approach to Parameter<br />
Estimation of Noisy Input-Output <strong>System</strong>s”. IEEE<br />
Trans. on Signal Process<strong>in</strong>g, 43(10):2386–97, Oct. 1995.<br />
[Sch79] R.O. Schmidt. “Multiple Emitter Location <strong>and</strong> Signal Parameter<br />
Estimation”. In Proc. RADC Spectrum Estimation<br />
Workshop, pages 243–258, Rome, NY, 1979.<br />
[Sch81] R. O. Schmidt. A Signal <strong>Subspace</strong> Approach to Multiple Emitter<br />
Location <strong>and</strong> Spectral Estimation. PhD thesis, Stanford<br />
Univ., Stanford, CA, Nov. 1981.
Bibliography 229<br />
[SH92] V. C. Soon <strong>and</strong> Y. F. Huang. “An Analysis of ESPRIT Under<br />
R<strong>and</strong>om <strong>Sensor</strong> Uncerta<strong>in</strong>ties”. IEEE Trans. on Sig. Proc.,<br />
SP-40(9):2353–2358, 1992.<br />
[SJ97] P. Stoica <strong>and</strong> M. Jansson. “<strong>On</strong> Forward-Backward MODE<br />
for Array Signal Process<strong>in</strong>g”. Digital Signal Process<strong>in</strong>g – A<br />
Review Journal, 1997. Accepted.<br />
[SK90] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “<strong>On</strong> the Sensitivity of the<br />
ESPRIT Algorithm to Non-Identical Subarrays”. Sādhanā,<br />
Academy Proc. <strong>in</strong> Eng. Sciences, 15(3):197–212, November<br />
1990.<br />
[SK92] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “A Performance Analysis of<br />
<strong>Subspace</strong>-Based <strong>Methods</strong> <strong>in</strong> the Presence of Model Errors –<br />
Part 1: The MUSIC Algorithm”. IEEE Trans. on Sig. Proc.,<br />
40(7):1758–1774, July 1992.<br />
[SK93] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “A Performance Analysis of<br />
<strong>Subspace</strong>-Based <strong>Methods</strong> <strong>in</strong> the Presence of Model Errors –<br />
Part 2: Multidimensional Algorithms”. IEEE Trans. on Sig.<br />
Proc., 41(9):2882–2890, Sept. 1993.<br />
[SM97] P. Stoica <strong>and</strong> R. Moses. “Introduction to Spectral Analysis”.<br />
Prentice Hall, Upper Saddle River, New Jersey, 1997.<br />
[SN89a] P. Stoica <strong>and</strong> A. Nehorai. “MUSIC, Maximum Likelihood <strong>and</strong><br />
Cramér-Rao Bound”. IEEE Trans. ASSP, ASSP-37:720–741,<br />
May 1989.<br />
[SN89b] P. Stoica <strong>and</strong> A. Nehorai. “MUSIC, Maximum Likelihood <strong>and</strong><br />
Cramér-Rao Bound: Further Results <strong>and</strong> Comparisons”. In<br />
ICASSP’89, 1989.<br />
[SN90] P. Stoica <strong>and</strong> A. Nehorai. “Performance Study of Conditional<br />
<strong>and</strong> Unconditional Direction-of-Arrival Estimation”. IEEE<br />
Trans. ASSP, ASSP-38:1783–1795, October 1990.<br />
[SROK95] A.L. Sw<strong>in</strong>dlehurst, R. Roy, B. Ottersten, <strong>and</strong> T. Kailath. “A<br />
<strong>Subspace</strong> Fitt<strong>in</strong>g Method for <strong>Identification</strong> of L<strong>in</strong>ear State<br />
Space Models”. IEEE Trans. Autom. Control, 40(2):311–316,<br />
Feb. 1995.
230 Bibliography<br />
[SS81] T. Söderström <strong>and</strong> P. Stoica. “Comparison of Some Instrumental<br />
Variable <strong>Methods</strong> – Consistency <strong>and</strong> Accuracy Aspects”.<br />
Automatica, 17(1):101–115, 1981.<br />
[SS83] T. Söderström <strong>and</strong> P. G. Stoica. Instrumental Variable <strong>Methods</strong><br />
for <strong>System</strong> <strong>Identification</strong>. Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1983.<br />
[SS89] T. Söderström <strong>and</strong> P. Stoica. <strong>System</strong> <strong>Identification</strong>. Prentice-<br />
Hall International, Hemel Hempstead, UK, 1989.<br />
[SS90a] P. Stoica <strong>and</strong> K. Sharman. “Maximum Likelihood <strong>Methods</strong><br />
for Direction-of-Arrival Estimation”. IEEE Trans. ASSP,<br />
ASSP-38:1132–1143, July 1990.<br />
[SS90b] P. Stoica <strong>and</strong> K. Sharman. “Novel Eigenanalysis Method for<br />
Direction Estimation”. IEE Proc. Part F, 137(1):19–26, Feb<br />
1990.<br />
[SS91] P. Stoica <strong>and</strong> T. Söderström. “Statistical Analysis of MUSIC<br />
<strong>and</strong> <strong>Subspace</strong> Rotation Estimates of S<strong>in</strong>usoidal Frequencies”.<br />
IEEE Trans. ASSP, ASSP-39:1836–1847, Aug. 1991.<br />
[SV95] P. Stoica <strong>and</strong> M. Viberg. “Maximum-Likelihood Parameter<br />
<strong>and</strong> Rank Estimation <strong>in</strong> Reduced-Rank Multivariate L<strong>in</strong>ear<br />
Regressions”. Technical Report CTH-TE-31, Chalmers University<br />
of Technology, Gothenburg, Sweden, July 1995. Submitted<br />
to IEEE Trans SP.<br />
[SV96] P. Stoica <strong>and</strong> M. Viberg. “Maximum-Likelihood Parameter<br />
<strong>and</strong> Rank Estimation <strong>in</strong> Reduced-Rank Multivariate<br />
L<strong>in</strong>ear Regressions”. IEEE Trans. on Signal Process<strong>in</strong>g,<br />
44(12):3069–3078, Dec. 1996.<br />
[Swi96] A. Sw<strong>in</strong>dlehurst. “A Maximum a Posteriori Approach to<br />
Beamform<strong>in</strong>g <strong>in</strong> the Presence of Calibration Errors”. In<br />
Proc. 8th Workshop on Stat. Sig. <strong>and</strong> Array Proc., Corfu,<br />
Greece, June 1996.<br />
[TO96] T. Trump <strong>and</strong> B. Ottersten. “Estimation of Nom<strong>in</strong>al Direction<br />
of Arrival <strong>and</strong> Angular Spread Us<strong>in</strong>g an Array of <strong>Sensor</strong>s”.<br />
Signal Process<strong>in</strong>g, 50(1-2):57–69, Apr. 1996.<br />
[Tre71] H.L. Van Trees. Detection, Estimation <strong>and</strong> Modulation Theory,<br />
volume III. Wiley, New York, 1971.
Bibliography 231<br />
[Van95] P. Van Overschee. <strong>Subspace</strong> <strong>Identification</strong>: Theory - Implementation<br />
- Application. PhD thesis, Katholieke Universiteit<br />
Leuven, Belgium, Feb. 1995.<br />
[VB88] B.D. Van Veen <strong>and</strong> K.M. Buckley. “Beamform<strong>in</strong>g: A Versatile<br />
Approach to Spatial Filter<strong>in</strong>g”. Signal Process<strong>in</strong>g Magaz<strong>in</strong>e,<br />
pages 4–24, Apr. 1988.<br />
[VD91] M. Verhaegen <strong>and</strong> E. Deprettere. “A Fast <strong>Subspace</strong>, Recursive<br />
MIMO State-space Model <strong>Identification</strong> Algorithm”. In<br />
Proc. 30th IEEE Conf. on Decision <strong>and</strong> Control, pages 1349–<br />
1354, Brighton, U.K., 1991.<br />
[VD92] M. Verhaegen <strong>and</strong> P. Dewilde. “<strong>Subspace</strong> Model <strong>Identification</strong>.<br />
Part I: The Output-Error State-Space Model <strong>Identification</strong><br />
Class of Algorithms”. Int. J. Control, 56:1187–1210,<br />
1992.<br />
[VD93] P. Van Overschee <strong>and</strong> B. De Moor. “<strong>Subspace</strong> Algorithms<br />
for the Stochastic <strong>Identification</strong> Problem”. Automatica,<br />
29(3):649–660, 1993.<br />
[VD94a] P. Van Overschee <strong>and</strong> B. De Moor. “A Unify<strong>in</strong>g Theorem for<br />
three <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> Algorithms”. In Proc.<br />
10th IFAC Symp. on Syst. Id., Copenhagen, Denmark, 1994.<br />
[VD94b] P. Van Overschee <strong>and</strong> B. De Moor. “N4SID: <strong>Subspace</strong> Algorithms<br />
for the <strong>Identification</strong> of Comb<strong>in</strong>ed Determ<strong>in</strong>istic-<br />
Stochastic <strong>System</strong>s”. Automatica, Special Issue on Statistical<br />
Signal Process<strong>in</strong>g <strong>and</strong> Control, 30(1):75–93, Jan. 1994.<br />
[VD95a] P. Van Overschee <strong>and</strong> B. De Moor. “About the choice of State<br />
Space Basis <strong>in</strong> Comb<strong>in</strong>ed Determ<strong>in</strong>istic-Stochastic <strong>Subspace</strong><br />
<strong>Identification</strong>”. Accepted for publication <strong>in</strong> Automatica, 1995.<br />
[VD95b] P. Van Overschee <strong>and</strong> B. De Moor. Choice of state-space basis<br />
<strong>in</strong> comb<strong>in</strong>ed determ<strong>in</strong>istic-stochastic subspace identification.<br />
Automatica, Special issue on trends <strong>in</strong> system identification,<br />
31(12):1877–1883, Dec. 1995.<br />
[VD95c] P. Van Overschee <strong>and</strong> B. De Moor. A unify<strong>in</strong>g theorem for<br />
three subspace system identification algorithms. Automatica,<br />
Special issue on trends <strong>in</strong> system identification, 31(12):1853–<br />
1864, Dec. 1995.
232 Bibliography<br />
[VD96] P. Van Overschee <strong>and</strong> B. De Moor. <strong>Subspace</strong> <strong>Identification</strong><br />
for L<strong>in</strong>ear <strong>System</strong>s: Theory–Implementation–Applications.<br />
Kluwer Academic Publishers, 1996.<br />
[VDS93] A.-J. Van Der Veen, E.F. Deprettere, <strong>and</strong> A.L. Sw<strong>in</strong>dlehurst.<br />
“<strong>Subspace</strong>-Based Signal Analysis Us<strong>in</strong>g S<strong>in</strong>gular Value Decomposition”.<br />
Proceed<strong>in</strong>gs of the IEEE, 81(9):1277–1308,<br />
Sept. 1993.<br />
[Ver91] M. Verhaegen. “A Novel Non-iterative MIMO State Space<br />
Model <strong>Identification</strong> Technique”. In Proc. 9th IFAC/IFORS<br />
Symp. on <strong>Identification</strong> <strong>and</strong> <strong>System</strong> Parameter Estimation,<br />
pages 1453–1458, Budapest, 1991.<br />
[Ver93] M. Verhaegen. “<strong>Subspace</strong> Model <strong>Identification</strong>. Part III: analysis<br />
of the ord<strong>in</strong>ary output-error state space model identification<br />
algorithm”. Int. J. Control, 58:555–586, 1993.<br />
[Ver94] M. Verhaegen. “<strong>Identification</strong> of the Determ<strong>in</strong>istic Part of<br />
MIMO State Space Models given <strong>in</strong> Innovations Form from<br />
Input-Output Data”. Automatica, Special Issue on Statistical<br />
Signal Process<strong>in</strong>g <strong>and</strong> Control, 30(1):61–74, Jan. 1994.<br />
[Vib89] M. Viberg. <strong>Subspace</strong> Fitt<strong>in</strong>g Concepts <strong>in</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g.<br />
PhD thesis, L<strong>in</strong>köp<strong>in</strong>g University, L<strong>in</strong>köp<strong>in</strong>g, Sweden,<br />
Oct. 1989.<br />
[Vib94] M. Viberg. “<strong>Subspace</strong> <strong>Methods</strong> <strong>in</strong> <strong>System</strong> <strong>Identification</strong>”. In<br />
Proc. 10th IFAC Symp. on Syst. Id., volume 1, pages 1–12,<br />
Copenhagen, Denmark, 1994.<br />
[Vib95] M. Viberg. “<strong>Subspace</strong>-based <strong>Methods</strong> for the <strong>Identification</strong><br />
of L<strong>in</strong>ear Time-<strong>in</strong>variant <strong>System</strong>s”. Automatica, Special issue<br />
on trends <strong>in</strong> system identification, 31(12):1835–1851, Dec.<br />
1995.<br />
[VL82] A.J.M. Van Overbeek <strong>and</strong> L. Ljung. “<strong>On</strong>-l<strong>in</strong>e Structure Selection<br />
for Multivariable State-space Models”. Automatica,<br />
18(5):529–543, 1982.<br />
[VO91] M. Viberg <strong>and</strong> B. Ottersten. “<strong>Sensor</strong> Array Process<strong>in</strong>g Based<br />
on <strong>Subspace</strong> Fitt<strong>in</strong>g”. IEEE Trans. SP, SP-39(5):1110–1121,<br />
May 1991.
Bibliography 233<br />
[VOK91a] M. Viberg, B. Ottersten, <strong>and</strong> T. Kailath. “Detection <strong>and</strong> Estimation<br />
<strong>in</strong> <strong>Sensor</strong> Arrays Us<strong>in</strong>g Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”.<br />
IEEE Trans. SP, SP-39(11):2436–2449, Nov. 1991.<br />
[VOK91b] M. Viberg, B. Ottersten, <strong>and</strong> T. Kailath. “<strong>Subspace</strong> Based<br />
Detection for L<strong>in</strong>ear Structural Relations”. J. of Comb<strong>in</strong>atorics,<br />
Information,& <strong>System</strong>s Sciences, 16(2-3):170–189,<br />
1991.<br />
[VOWL91] M. Viberg, B. Ottersten, B. Wahlberg, <strong>and</strong> L. Ljung. “A Statistical<br />
Perspective on State-Space Model<strong>in</strong>g Us<strong>in</strong>g <strong>Subspace</strong><br />
<strong>Methods</strong>”. In Proc. 30 th IEEE Conf. on Decision & Control,<br />
pages 1337–1342, Brighton, Engl<strong>and</strong>, Dec. 1991.<br />
[VOWL93] M. Viberg, B. Ottersten, B. Wahlberg, <strong>and</strong> L. Ljung. “Performance<br />
of <strong>Subspace</strong>-Based <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”.<br />
In Proc. IFAC 93, Sydney, Australia, July 1993.<br />
[VS94a] M. Viberg <strong>and</strong> A. L. Sw<strong>in</strong>dlehurst. “A Bayesian Approach to<br />
Auto-Calibration for Parametric Array Signal Process<strong>in</strong>g”.<br />
IEEE Trans. on Sig. Proc., 42(12):3495–3507, Dec. 1994.<br />
[VS94b] M. Viberg <strong>and</strong> A. L. Sw<strong>in</strong>dlehurst. “Analysis of the Comb<strong>in</strong>ed<br />
Effects of F<strong>in</strong>ite Samples <strong>and</strong> Model Errors on Array<br />
Process<strong>in</strong>g Performance”. IEEE Trans. on Sig. Proc.,<br />
42(11):3073–3083, Nov. 1994.<br />
[VWO97] M. Viberg, B. Wahlberg, <strong>and</strong> B. Ottersten. “Analysis of<br />
State Space <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong> Based on Instrumental<br />
Variables <strong>and</strong> <strong>Subspace</strong> Fitt<strong>in</strong>g”. Automatica, 1997.<br />
Accepted.<br />
[Wax91] M. Wax. “Detection <strong>and</strong> Localization of Multiple Sources<br />
via the Stochastic Signals Model”. IEEE Trans. on SP,<br />
39(11):2450–2456, Nov. 1991.<br />
[Wax92] M. Wax. “Detection <strong>and</strong> Localization of Multiple Sources <strong>in</strong><br />
Noise with Unknown Covariance”. IEEE Trans. on Signal<br />
Process<strong>in</strong>g, SP-40(1):245–249, Jan. 1992.<br />
[WF89] A. J. Weiss <strong>and</strong> B. Friedl<strong>and</strong>er. “Array Shape Calibration<br />
Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations - A Maximum Likelihood<br />
Approach”. IEEE Trans. on ASSP, 37(12):1958–1966,<br />
Dec. 1989.
234 Bibliography<br />
[WJ94] B. Wahlberg <strong>and</strong> M. Jansson. “4SID L<strong>in</strong>ear Regression”. In<br />
Proc. IEEE 33rd Conf. on Decision <strong>and</strong> Control, pages 2858–<br />
2863, Orl<strong>and</strong>o, USA, December 1994.<br />
[WOV91] B. Wahlberg, B. Ottersten, <strong>and</strong> M. Viberg. “Robust Signal<br />
Parameter Estimation <strong>in</strong> the Presence of Array Perturbations”.<br />
In Proc. IEEE ICASSP, pages 3277–3280, Toronto,<br />
Canada, 1991.<br />
[WRM94] M. Wylie, S. Roy, <strong>and</strong> H. Messer. “Jo<strong>in</strong>t DOA Estimation<br />
<strong>and</strong> Phase Calibration of L<strong>in</strong>ear Equispaced (LES) Arrays”.<br />
IEEE Trans. on Sig. Proc., 42(12):3449–3459, Dec. 1994.<br />
[WSW95] M. Wax, J. She<strong>in</strong>vald, <strong>and</strong> A. J. Weiss. “Localization of Correlated<br />
<strong>and</strong> Uncorrelated Signals <strong>in</strong> Colored Noise via Generalized<br />
Least Squares”. In Proc. ICASSP 95, pages 2104–2107,<br />
Detroit, MI, 1995. IEEE.<br />
[WWN88] K. M. Wong, R. S. Walker, <strong>and</strong> G. Niezgoda. “Effects of<br />
R<strong>and</strong>om <strong>Sensor</strong> Motion on Bear<strong>in</strong>g Estimation by the MU-<br />
SIC Algorithm”. IEE Proceed<strong>in</strong>gs, 135, Pt. F(3):233–250,<br />
June 1988.<br />
[WZ89] M. Wax <strong>and</strong> I. Zisk<strong>in</strong>d. “<strong>On</strong> Unique Localization of Multiple<br />
Sources by Passive <strong>Sensor</strong> Arrays”. IEEE Trans. on ASSP,<br />
ASSP-37(7):996–1000, July 1989.<br />
[YS95] J. Yang <strong>and</strong> A. Sw<strong>in</strong>dlehurst. “The Effects of Array Calibration<br />
Errors on DF-Based Signal Copy Performance”. IEEE<br />
Trans. on Sig. Proc., 43(11):2724–2732, November 1995.<br />
[YZ95] P. Yip <strong>and</strong> Y. Zhou. “A Self-Calibration Algorithm for Cyclostationary<br />
Signals <strong>and</strong> its Uniqueness Analysis”. In Proc.<br />
ICASSP, pages 1892–1895, Detroit, MI, 1995.<br />
[ZM74] H.P. Zeiger <strong>and</strong> A.J. McEwen. “Approximate L<strong>in</strong>ear Realizations<br />
of Given Dimension via Ho’s Algorithm”. IEEE Trans.<br />
AC, AC-19:153, April 1974.<br />
[ZW88] J. X. Zhu <strong>and</strong> H. Wang. “Effects of <strong>Sensor</strong> Position <strong>and</strong> Pattern<br />
Perturbations on CRLB for Direction F<strong>in</strong>d<strong>in</strong>g of Multiple<br />
Narrowb<strong>and</strong> Sources”. In Proc. 4 th ASSP Workshop on<br />
Spectral Estimation <strong>and</strong> Model<strong>in</strong>g, pages 98–102, M<strong>in</strong>neapolis,<br />
MN, August 1988.