Optimal kernels for time-frequency analysis - Rice University Digital ...
Optimal kernels for time-frequency analysis - Rice University Digital ...
Optimal kernels for time-frequency analysis - Rice University Digital ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Optimal</strong> <strong>kernels</strong> <strong>for</strong> <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong><br />
Richard G. Baraniuk Douglas L. Jones<br />
Electrical and Computer Engineering<br />
252 Engineering Research Laboratory<br />
<strong>University</strong> of Illinois<br />
Coordinated Science Laboratory<br />
<strong>University</strong> of illinois<br />
1 101 W. Springfield Ave.<br />
Urbana, IL 61801 Urbana, IL 61801<br />
(217) 333-0766 (217) 244-6823<br />
ABSTRACT<br />
Current bilinear <strong>time</strong>-<strong>frequency</strong> representations apply a fixed kernel to smooth the Wigner distribution. However, the<br />
choice of a fixed kernel limits the class ofsignals that can be analyzed effectively. This paper presents optimality criteria<br />
<strong>for</strong> the design of signal-dependeni <strong>kernels</strong> that suppress cross-components while passing as much auto-component<br />
energy as possible, irrespective of the <strong>for</strong>m of the signal. A fast algorithm <strong>for</strong> the optimal kernel solution makes the<br />
procedure competitive computationaily with fixed kernel methods. Examples demonstrate the superior per<strong>for</strong>mance<br />
of the optimal kernel <strong>for</strong> a <strong>frequency</strong> modulated signal.<br />
1. INTRODUCTION<br />
Time-Frequency Distributions (TFD's), which indicate the energy content of a signal as a function of both <strong>time</strong> and<br />
<strong>frequency</strong>, are a powerful tool <strong>for</strong> <strong>time</strong>-varying signal <strong>analysis</strong>. The Wigner Distribution (WD)<br />
W(t,w) = t: s(t + )s*(t _ )e_Tdr<br />
is of great interest due to a number of attractive properties' . However, it also has spurious cross-components and<br />
high noise sensitivity, both of which obscure the true signal features. There<strong>for</strong>e, the WD is often convolved with a<br />
two-dimensional smoothing function that suppresses cross-components at the expense of signal energy concentration.<br />
It is well known that all bilinear TFD's can be represented as smoothed versions of the WD2; that is, if P(t,w) is a<br />
bilinear TFD, then<br />
P(t,w) = W(t,w) * *(t,w) (2)<br />
<strong>for</strong> some function (" ** " denotes two-dimensional convolution). Equation (2) may be rewritten usingthe twodimensional<br />
inverse Fourier trans<strong>for</strong>m as<br />
C(O, r) = A(6, r)'I(9, r), (3)<br />
where C(9, r), the inverse Fourier trans<strong>for</strong>m of P(t, w), is known as the characteristic function of the distribution;<br />
A(O, r), the trans<strong>for</strong>m of the WD, is called the Ambiguity Function (AF); and (9, r), the trans<strong>for</strong>m of the smoothing<br />
function is known as the kernel of the TFD. The AF is also given directly by<br />
A(O, r) =i: s(t + )s*(t _ JOtft (4)<br />
Equation (3) indicates that we can interpret any bilinear TFD as the two-dimensional Fourier trans<strong>for</strong>m of a weighted<br />
version of the AF.<br />
The kernel is frequently chosen to weight the AF such that the auto-components of the distribution are passed while<br />
the cross-components and noise are suppressed. In principle, this is possible when the auto-components and crosscomponents<br />
do not overlap. Many <strong>kernels</strong> have been proposed, but selection of a fixed kernel limits the class of signals<br />
<strong>for</strong> which the representation will per<strong>for</strong>m well. That is, <strong>for</strong> any fixed kernel, it is always possible to find signals <strong>for</strong><br />
SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 181<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms<br />
(1)
which the TFD exhibits poor auto-component concentration or little cross-component suppression. (The same problem<br />
limits the per<strong>for</strong>mance of wavelet <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong>3, since the choice of a fixed analyzing wavelet restricts the<br />
class of signals which can be analyzed effectively.)<br />
The limitations of fixed kernel <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong> can be illustrated by analyzing a simple signal with several<br />
different TFD's. The WD of the sum of two chirp signals of large effective <strong>time</strong> envelope is shown in Fig. 1. Although<br />
the auto-components are highly concentrated, there is a large cross-component. The Choi-Williams distribution4,<br />
which has an exponential kernel<br />
cw(O,T)=e (5)<br />
works well <strong>for</strong> signals whose components have distributions nearly parallel with the <strong>time</strong> or <strong>frequency</strong> axis. It per<strong>for</strong>ms<br />
poorly, however, <strong>for</strong> signals with substantial <strong>frequency</strong> modulation, as seen in Fig. 2, since the kernel severely truncates<br />
the auto-components ofsuch signals. The kernel generating the spectrogram is related to the AF of the <strong>analysis</strong> window<br />
w(t) by<br />
S(9,T) = i: w(t + )w*(t )ei0tdt.<br />
(6)<br />
Results are excellent <strong>for</strong> signal components that resemble the window5, but all mismatched components are distorted.<br />
Figure 3 displays a spectrogram computed using a Hamming window of length similar to the effective <strong>time</strong> width of<br />
the signal. It is poorly concentrated and obscures the true nature of the signal. If the <strong>analysis</strong> window, and hence<br />
the kernel, is matched to one of the signal components, the matched-filter spectrogram results. See Fig. 4. While<br />
the matched-filter technique can yield excellent results, it works <strong>for</strong> only one type of signal component and requires a<br />
priori knowledge of the <strong>for</strong>m of the component.<br />
Since the best kernel function depends on the signal to be analyzed, we expect to obtain good per<strong>for</strong>mance <strong>for</strong> a broad<br />
class of signals only by using a signal-dependent kernel. Signal-dependent <strong>kernels</strong> are proposed by several authors.<br />
The adaptive spectrogram representation <strong>for</strong> speech signals developed by Glinski6 adapts the window based on a<br />
segmentation (provided by the user) of the signal into pitch periods. Jones and Boashash7 adapt the modulation<br />
rate of a fixed window to match an estimate of the signal's instantaneous <strong>frequency</strong>. <strong>Optimal</strong> smoothing <strong>kernels</strong> are<br />
considered by Andrieux ei al.8 , but only <strong>for</strong> simple signals of the <strong>for</strong>m s(t) = e2v(t) , and only <strong>for</strong> the restrictive class<br />
of Gaussian <strong>kernels</strong>. Kadambe, Boudreaux-Bartels and Duvaut9 utilize an adaptive filtering technique coupled with<br />
AR modeling and clustering to design <strong>kernels</strong>. Nuttall'° designs a kernel composed of Gaussian components based<br />
on in<strong>for</strong>mation that the user provides after viewing the WD. Jones and Parks'1 develop a technique using Gaussian<br />
<strong>kernels</strong> which vary with <strong>time</strong> and <strong>frequency</strong> to maximize a local measure of signal-energy concentration.<br />
Each of the methods described above either is ad hoc, excessively restricts the class of allowable <strong>kernels</strong>, is computationally<br />
expensive or requires human intervention. We propose a new procedure <strong>for</strong> selecting a signal-dependent<br />
kernel. Given a signal, the method automatically designs a kernel that is optimal with respect to a set of per<strong>for</strong>mance<br />
criteria. Since the class of <strong>kernels</strong> that we consider is large, good per<strong>for</strong>mance is expected <strong>for</strong> a wide range of signals.<br />
The procedure also has a computational complexity that is comparable to fixed-kernel techniques.<br />
2. OPTIMAL KERNEL DESIGN<br />
Rather than choosing an ad hoc method <strong>for</strong> signal-dependent kernel selection, it seems appropriate to <strong>for</strong>mulate the<br />
procedure as an optimization problem. The problem <strong>for</strong>mulation requires a class of <strong>kernels</strong> from which the optimal<br />
kernel is chosen, and a per<strong>for</strong>mance index that measures the quality of the <strong>time</strong>-<strong>frequency</strong> representation with respect<br />
to criteria deemed important by the designer. The kernel that maximizes the value of the per<strong>for</strong>mance measure is<br />
selected as the optimal kernel <strong>for</strong> the signal.<br />
The class of <strong>kernels</strong> must be large enough to allow <strong>for</strong> good per<strong>for</strong>mance <strong>for</strong> all signals of interest in a given application.<br />
Likewise, the per<strong>for</strong>mance measure must be chosen to yield a tractable optimization problem that can be<br />
solved efficiently. An example of a useful per<strong>for</strong>mance index is a measure of the signal-energy concentration of the<br />
distribution". Clearly, the choice of kernel class and per<strong>for</strong>mance measure is crucial to the success of the method.<br />
However, once a satisfactory class and measure are found, kernel design <strong>for</strong> a wide range of signals is reduced to solving<br />
an optimization problem.<br />
The optimal design concept can be generalized to classes of TFD's other than the bilinear by defining a subclass of<br />
182 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990)<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms
allowable TFD's and a per<strong>for</strong>mance index. The <strong>for</strong>mulation of optimization problems <strong>for</strong> TFD design is relatively<br />
simple in the bilinear case, because a bilinear TFD is completely specified by its two-dimensional smoothing kernel.<br />
Thus, we can find the optimal bilinear TFD <strong>for</strong> a signal simply by solving <strong>for</strong> the optimal kernel.<br />
2.1 Continuous optimization <strong>for</strong>mulation<br />
This section develops an optimization problem <strong>for</strong>mulation <strong>for</strong> kernel design that relies on the AF of the signal and<br />
the characteristic function representation of a TFD indicated by (3). We propose optimality criteria based on the AF<br />
<strong>for</strong> three reasons. First, the multiplicative operation of the kernel on the AF is easier to visualize than convolution of<br />
the WD with the smoothing function, which simplifies the construction of a quality measure. Second, the AF serves<br />
to separate the auto and cross-components5 . Third, the AF may lead to efficient computation of the optimal TFD,<br />
since the TFD is merely the two-dimensional Fourier trans<strong>for</strong>m of the product of the optimal kernel and the AF.<br />
We consider an optimal kernel in the continuous case to be one that satisfies the following optimization problem:<br />
p2 p00<br />
max / / (7)<br />
Jo Jo<br />
subject to (O, 0) 1 and<br />
I(ri,/) I(r2,'i/.') V r1 < r2 ,V b, (8)<br />
p2r p00<br />
and subject to I I I'I(r,cb)I2rdrdb < a<br />
Jo Jo<br />
(9)<br />
where A(r, ) is the AF of the signal in polar coordinates, and the kernel (r, b) is assumed to be real and positive.<br />
The per<strong>for</strong>mance measure (7) expresses our desire to pass as much auto—component energy as possible into the TFD<br />
<strong>for</strong> a kernel of fixed volume a, . The second constraint (8) <strong>for</strong>ces the kernel to be radially nonincreasing. Since the<br />
AF auto—components are centered at the origin, this encourages the kernel to preferentially pass auto-components.<br />
The final constraint (9) restricts the size of the kernel so that cross-components are suppressed. An advantage of this<br />
<strong>for</strong>mulation is that the constraints are insensitive to both the <strong>time</strong>-scale and orientation of the signal in <strong>time</strong>-<strong>frequency</strong>.<br />
2.2 Discrete optimization <strong>for</strong>mulation<br />
In practice, TFD's are computed at discrete <strong>time</strong> and <strong>frequency</strong> locations, so we re<strong>for</strong>mulate the optimization problem<br />
by discretizing equations (7)—(9). With suitably dense sampling, the discrete <strong>for</strong>mulation converges to the continuous<br />
<strong>for</strong>mulation. Per<strong>for</strong>ming the discretization, we define an optimal discrete kernel to be one that satisfies:<br />
fAd(rn, n)'1d(m, n)12, (10)<br />
nx ;:<br />
subject to d(O,O) = 1 and<br />
Jd(m, n) is radially nonincreasing, (11)<br />
and subject to :i Id(m, n)f2 ad (12)<br />
where Ad(m, n) is the N x N discrete AF of the signal to be analyzed, and the kernel d(rn, n) is assumed to bereal<br />
and positive. Note that since the AF is conjugate symmetric through the origin<br />
the optimal kernel can be computed from a half-plane of AF samples.<br />
Ad(rn,fl) = A(—m,_n), (13)<br />
The constraint that the kernel be radially nonincreasing can be implemented exactly only on a polar grid of samples.<br />
However, computing the AF and resulting TFD on a polar grid requires either the computation of a polar Fourier<br />
trans<strong>for</strong>m, <strong>for</strong> which no fast algorithm exists, or a costly interpolation from a rectangular grid. There<strong>for</strong>e, we approximate<br />
the polar grid by a set of paths on a rectangular grid. Figure 5 illustrates a tree structure that approximates the<br />
SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 183<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms
adial dependencies of the kernel <strong>for</strong> the upper half-plane of a 64x64 rectangular grid. The nonincreasing constraint<br />
is en<strong>for</strong>ced along each path from the origin to the edge. The branches of the tree are constructed to minimize the<br />
maximum deviation from the branch to the true radial line.<br />
3. OPTIMAL KERNEL SOLUTION<br />
Since the per<strong>for</strong>mance measure and constraints are linear in IdI2, the optimal kernel may be found by applying linear<br />
programming'2 to solve <strong>for</strong> the N2 unknowns 1'd(m, n) (since 1d is assumed to be real and positive, knowing NdI2is<br />
equivalent to knowing d). Moreover, it can be shown that the optimal kernel takes on essentially only the values of<br />
one and zero.<br />
The optimal TFD can thus be determined as follows. First, the discrete AF of the signal to be analyzed is computed.<br />
Next, the linear program (1O)—(12) is solved <strong>for</strong> the optimal kernel, which is then multiplied by the AF. The twodimensional<br />
Fourier trans<strong>for</strong>m of the product is the optimal TFD.<br />
3.1 Fast Algorithm <strong>for</strong> Solution<br />
A solution <strong>for</strong> the optimal kernel using standard linear programming methods may be simple, but it is also computationally<br />
expensive. Use of the simplex algorithm would cause the optimal kernel computation to dominate the total<br />
cost of computing the optimal TFD. However, we have found an extremely efficient inductive procedure that computes<br />
the optimal kernel with O[N2] operations to find the optimal kernel of size a. Since this number is small in comparison<br />
to the O[N2 log N] computations required to find the AF or WD, <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong> with signal-dependent<br />
<strong>kernels</strong> is competitive computationally with traditional fixed kernel methods.<br />
3.2 Implementation Issues<br />
Although the one-zero kernel is optimal according to the constraints stated in section 2.2, its sharp cutoff may introduce<br />
ringing in the optimal TFD. Thus, some <strong>for</strong>m ofsmoothing may be desired. One simple approach, used in the examples,<br />
tapers the kernel.<br />
Adjustment of the parameter cd controls the tradeoff between cross-component suppression and smearing of the<br />
auto-components. A lower bound on reasonable values <strong>for</strong> a can be derived from uncertainty principle arguments.<br />
4. EXAMPLES<br />
In order to compare the results of the optimal kernel design procedure with other TFD's, the optimal kernel was<br />
computed <strong>for</strong> the same signal discussed in the introduction. The AF and kernel were of size 64 x 64, the parameter<br />
ad was set to 30 and tapering was applied. The AF, optimal kernel and resulting TFD are shown in Figs. 6,7 and 8.<br />
The cross-component visible in all of the other TFD's except the matched-filter spectrogram (see Fig. 4) is virtually<br />
eliminated, yet the distribution is still quite concentrated — much more so than the matched-filter spectrogram.<br />
Figure 9 illustrates the WD of the same signal corrupted by additive white Gaussian noise. The SNR of the resulting<br />
signal is 0dB. The optimal kernel was computed using the same parameters as above. The cross-component and noise<br />
suppression of the optimal TFD, shown in Fig. 10, are excellent, indicating that the kernel design procedure is robust<br />
in the presence of significant additive noise.<br />
5. CONCLUSION<br />
An optimization procedure has been presented <strong>for</strong> the automatic determination of signal-dependent smoothing <strong>kernels</strong><br />
<strong>for</strong> <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong>. Due to the signal-dependent nature of the kernel, the quality of the resulting <strong>time</strong><strong>frequency</strong><br />
representation is insensitive to the <strong>time</strong>-scale and orientation of the signal. A fast algorithm <strong>for</strong> the optimal<br />
kernel solution makes the method competitive computationally with traditional fixed kernel methods. The procedure<br />
appears to yield excellent results <strong>for</strong> a much larger class of signals than any fixed kernel representation. The technique<br />
per<strong>for</strong>ms well even in the presence of substantial additive noise.<br />
184 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990)<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms
6. ACKNOWLEDGMENTS<br />
This work was supported by the Music Group of the Computer-based Education Research Laboratory and the Joint<br />
Services Electronics Program, Grant No. N00014-90-J--1270.<br />
7. REFERENCES<br />
1. T.A.C.M. Claasen and W.F.G. Mecklenbräuker, "The Wigner Distribution — A Tool <strong>for</strong> Time—Frequency Signal<br />
Analysis — Part I: Continuous—Time Signals," Philips Journal ofResearch 35(3), pp. 217-250, 1980.<br />
2. L. Cohen, "Time—Frequency Distributions —A Review," Proceedings of the IEEE 77(7), pp. 941—981, July 1989.<br />
3. P. Flandrin, 0. Rioul, "Affine Smoothing of the Wigner-Ville Distribution," IEEE ICASSP-1990, pp. 2455—2458,<br />
1990.<br />
4. 11.-I. Choi and W.J. Williams, "Improved Time-Frequency Representation of Multicomponent Signals Using Exponential<br />
Kernels," IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-37(6), pp. 862—871,<br />
June 1989.<br />
5. P.Flandrin, "Some Features of Time-Frequency Representations of Multicomponent Signals," IEEE IGASSP-1 984,<br />
pp. 41.B.4.1—41.B.4.4, 1984.<br />
6. S.C. Glinski, "Diphone Speech Synthesis Based on a Pitch-Adaptive Short-Time Fourier Trans<strong>for</strong>m," Ph.D Thesis,<br />
<strong>University</strong> of Illinois, Urbana, 1981.<br />
7. G. Jones and B. Boashash, "Instantaneous Frequency, Instantaneous Bandwidth and the Analysis of Multicomponent<br />
Signals," IEEE IGASSP-1990, pp. 2467—2470, 1990.<br />
8. J.C. Andrieux, M.R. Felix, G. Mourgues, P. Bertrand, B. Izrar and V.T. Nguyen, "Optimum Smoothing of the<br />
Wigner—Ville Distribution," IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP—35(6), pp.<br />
764—769, June 1987.<br />
9. 5. Kadambe, G.F. Boudreaux-Bartels and P. Duvaut, "Window Length Selection <strong>for</strong> Smoothing the Wigner Distribution<br />
by Applying an Adaptive Filter Technique," IEEE ICASSP-1989, pp. 2226—2229, 1989.<br />
10. A.H. Nuttall. "Wigner Distribution Function: Relation to Short-Term Spectral Estimation, Smoothing, and Per<strong>for</strong>mance<br />
in Noise," NUSC Technical Report 8225, February 16, 1988.<br />
11. D.L. Jones and T.W. Parks, "A High Resolution Data-Adaptive Time-Frequency Representation," IEEE ICASSP-<br />
87, Dallas TX, pp. 681—684, April 1987.<br />
12. D.G. Luenberger, Introduction to Linear and Nonlinear Programming, Addison-Wesley Co., Reading, MA, 1973.<br />
SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 185<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms
Fig.1. Magnitude of the Wigner distribution.<br />
Fig. 3. Magnitude of the spectrogram computed<br />
with a Hamming window of duration equal to<br />
the effective <strong>time</strong> width of the signal.<br />
Fig. 2. Magnitude of the Choi-Williams disthbution<br />
computed with smoothing parameter a =20.<br />
Fig. 4. Magnitude of the matched-filter spectrogram.<br />
Fig. 5. Approximation of a radial dependency graph'on a rectangular grid.<br />
Architectures, and Implementations (1990)<br />
186 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms,<br />
Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms
Fig. 6. Magnitude of the ambiguity function. Fig. 7. The tapered optimal kernel, a