24.10.2012 Views

Optimal kernels for time-frequency analysis - Rice University Digital ...

Optimal kernels for time-frequency analysis - Rice University Digital ...

Optimal kernels for time-frequency analysis - Rice University Digital ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Optimal</strong> <strong>kernels</strong> <strong>for</strong> <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong><br />

Richard G. Baraniuk Douglas L. Jones<br />

Electrical and Computer Engineering<br />

252 Engineering Research Laboratory<br />

<strong>University</strong> of Illinois<br />

Coordinated Science Laboratory<br />

<strong>University</strong> of illinois<br />

1 101 W. Springfield Ave.<br />

Urbana, IL 61801 Urbana, IL 61801<br />

(217) 333-0766 (217) 244-6823<br />

ABSTRACT<br />

Current bilinear <strong>time</strong>-<strong>frequency</strong> representations apply a fixed kernel to smooth the Wigner distribution. However, the<br />

choice of a fixed kernel limits the class ofsignals that can be analyzed effectively. This paper presents optimality criteria<br />

<strong>for</strong> the design of signal-dependeni <strong>kernels</strong> that suppress cross-components while passing as much auto-component<br />

energy as possible, irrespective of the <strong>for</strong>m of the signal. A fast algorithm <strong>for</strong> the optimal kernel solution makes the<br />

procedure competitive computationaily with fixed kernel methods. Examples demonstrate the superior per<strong>for</strong>mance<br />

of the optimal kernel <strong>for</strong> a <strong>frequency</strong> modulated signal.<br />

1. INTRODUCTION<br />

Time-Frequency Distributions (TFD's), which indicate the energy content of a signal as a function of both <strong>time</strong> and<br />

<strong>frequency</strong>, are a powerful tool <strong>for</strong> <strong>time</strong>-varying signal <strong>analysis</strong>. The Wigner Distribution (WD)<br />

W(t,w) = t: s(t + )s*(t _ )e_Tdr<br />

is of great interest due to a number of attractive properties' . However, it also has spurious cross-components and<br />

high noise sensitivity, both of which obscure the true signal features. There<strong>for</strong>e, the WD is often convolved with a<br />

two-dimensional smoothing function that suppresses cross-components at the expense of signal energy concentration.<br />

It is well known that all bilinear TFD's can be represented as smoothed versions of the WD2; that is, if P(t,w) is a<br />

bilinear TFD, then<br />

P(t,w) = W(t,w) * *(t,w) (2)<br />

<strong>for</strong> some function (" ** " denotes two-dimensional convolution). Equation (2) may be rewritten usingthe twodimensional<br />

inverse Fourier trans<strong>for</strong>m as<br />

C(O, r) = A(6, r)'I(9, r), (3)<br />

where C(9, r), the inverse Fourier trans<strong>for</strong>m of P(t, w), is known as the characteristic function of the distribution;<br />

A(O, r), the trans<strong>for</strong>m of the WD, is called the Ambiguity Function (AF); and (9, r), the trans<strong>for</strong>m of the smoothing<br />

function is known as the kernel of the TFD. The AF is also given directly by<br />

A(O, r) =i: s(t + )s*(t _ JOtft (4)<br />

Equation (3) indicates that we can interpret any bilinear TFD as the two-dimensional Fourier trans<strong>for</strong>m of a weighted<br />

version of the AF.<br />

The kernel is frequently chosen to weight the AF such that the auto-components of the distribution are passed while<br />

the cross-components and noise are suppressed. In principle, this is possible when the auto-components and crosscomponents<br />

do not overlap. Many <strong>kernels</strong> have been proposed, but selection of a fixed kernel limits the class of signals<br />

<strong>for</strong> which the representation will per<strong>for</strong>m well. That is, <strong>for</strong> any fixed kernel, it is always possible to find signals <strong>for</strong><br />

SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 181<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms<br />

(1)


which the TFD exhibits poor auto-component concentration or little cross-component suppression. (The same problem<br />

limits the per<strong>for</strong>mance of wavelet <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong>3, since the choice of a fixed analyzing wavelet restricts the<br />

class of signals which can be analyzed effectively.)<br />

The limitations of fixed kernel <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong> can be illustrated by analyzing a simple signal with several<br />

different TFD's. The WD of the sum of two chirp signals of large effective <strong>time</strong> envelope is shown in Fig. 1. Although<br />

the auto-components are highly concentrated, there is a large cross-component. The Choi-Williams distribution4,<br />

which has an exponential kernel<br />

cw(O,T)=e (5)<br />

works well <strong>for</strong> signals whose components have distributions nearly parallel with the <strong>time</strong> or <strong>frequency</strong> axis. It per<strong>for</strong>ms<br />

poorly, however, <strong>for</strong> signals with substantial <strong>frequency</strong> modulation, as seen in Fig. 2, since the kernel severely truncates<br />

the auto-components ofsuch signals. The kernel generating the spectrogram is related to the AF of the <strong>analysis</strong> window<br />

w(t) by<br />

S(9,T) = i: w(t + )w*(t )ei0tdt.<br />

(6)<br />

Results are excellent <strong>for</strong> signal components that resemble the window5, but all mismatched components are distorted.<br />

Figure 3 displays a spectrogram computed using a Hamming window of length similar to the effective <strong>time</strong> width of<br />

the signal. It is poorly concentrated and obscures the true nature of the signal. If the <strong>analysis</strong> window, and hence<br />

the kernel, is matched to one of the signal components, the matched-filter spectrogram results. See Fig. 4. While<br />

the matched-filter technique can yield excellent results, it works <strong>for</strong> only one type of signal component and requires a<br />

priori knowledge of the <strong>for</strong>m of the component.<br />

Since the best kernel function depends on the signal to be analyzed, we expect to obtain good per<strong>for</strong>mance <strong>for</strong> a broad<br />

class of signals only by using a signal-dependent kernel. Signal-dependent <strong>kernels</strong> are proposed by several authors.<br />

The adaptive spectrogram representation <strong>for</strong> speech signals developed by Glinski6 adapts the window based on a<br />

segmentation (provided by the user) of the signal into pitch periods. Jones and Boashash7 adapt the modulation<br />

rate of a fixed window to match an estimate of the signal's instantaneous <strong>frequency</strong>. <strong>Optimal</strong> smoothing <strong>kernels</strong> are<br />

considered by Andrieux ei al.8 , but only <strong>for</strong> simple signals of the <strong>for</strong>m s(t) = e2v(t) , and only <strong>for</strong> the restrictive class<br />

of Gaussian <strong>kernels</strong>. Kadambe, Boudreaux-Bartels and Duvaut9 utilize an adaptive filtering technique coupled with<br />

AR modeling and clustering to design <strong>kernels</strong>. Nuttall'° designs a kernel composed of Gaussian components based<br />

on in<strong>for</strong>mation that the user provides after viewing the WD. Jones and Parks'1 develop a technique using Gaussian<br />

<strong>kernels</strong> which vary with <strong>time</strong> and <strong>frequency</strong> to maximize a local measure of signal-energy concentration.<br />

Each of the methods described above either is ad hoc, excessively restricts the class of allowable <strong>kernels</strong>, is computationally<br />

expensive or requires human intervention. We propose a new procedure <strong>for</strong> selecting a signal-dependent<br />

kernel. Given a signal, the method automatically designs a kernel that is optimal with respect to a set of per<strong>for</strong>mance<br />

criteria. Since the class of <strong>kernels</strong> that we consider is large, good per<strong>for</strong>mance is expected <strong>for</strong> a wide range of signals.<br />

The procedure also has a computational complexity that is comparable to fixed-kernel techniques.<br />

2. OPTIMAL KERNEL DESIGN<br />

Rather than choosing an ad hoc method <strong>for</strong> signal-dependent kernel selection, it seems appropriate to <strong>for</strong>mulate the<br />

procedure as an optimization problem. The problem <strong>for</strong>mulation requires a class of <strong>kernels</strong> from which the optimal<br />

kernel is chosen, and a per<strong>for</strong>mance index that measures the quality of the <strong>time</strong>-<strong>frequency</strong> representation with respect<br />

to criteria deemed important by the designer. The kernel that maximizes the value of the per<strong>for</strong>mance measure is<br />

selected as the optimal kernel <strong>for</strong> the signal.<br />

The class of <strong>kernels</strong> must be large enough to allow <strong>for</strong> good per<strong>for</strong>mance <strong>for</strong> all signals of interest in a given application.<br />

Likewise, the per<strong>for</strong>mance measure must be chosen to yield a tractable optimization problem that can be<br />

solved efficiently. An example of a useful per<strong>for</strong>mance index is a measure of the signal-energy concentration of the<br />

distribution". Clearly, the choice of kernel class and per<strong>for</strong>mance measure is crucial to the success of the method.<br />

However, once a satisfactory class and measure are found, kernel design <strong>for</strong> a wide range of signals is reduced to solving<br />

an optimization problem.<br />

The optimal design concept can be generalized to classes of TFD's other than the bilinear by defining a subclass of<br />

182 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990)<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms


allowable TFD's and a per<strong>for</strong>mance index. The <strong>for</strong>mulation of optimization problems <strong>for</strong> TFD design is relatively<br />

simple in the bilinear case, because a bilinear TFD is completely specified by its two-dimensional smoothing kernel.<br />

Thus, we can find the optimal bilinear TFD <strong>for</strong> a signal simply by solving <strong>for</strong> the optimal kernel.<br />

2.1 Continuous optimization <strong>for</strong>mulation<br />

This section develops an optimization problem <strong>for</strong>mulation <strong>for</strong> kernel design that relies on the AF of the signal and<br />

the characteristic function representation of a TFD indicated by (3). We propose optimality criteria based on the AF<br />

<strong>for</strong> three reasons. First, the multiplicative operation of the kernel on the AF is easier to visualize than convolution of<br />

the WD with the smoothing function, which simplifies the construction of a quality measure. Second, the AF serves<br />

to separate the auto and cross-components5 . Third, the AF may lead to efficient computation of the optimal TFD,<br />

since the TFD is merely the two-dimensional Fourier trans<strong>for</strong>m of the product of the optimal kernel and the AF.<br />

We consider an optimal kernel in the continuous case to be one that satisfies the following optimization problem:<br />

p2 p00<br />

max / / (7)<br />

Jo Jo<br />

subject to (O, 0) 1 and<br />

I(ri,/) I(r2,'i/.') V r1 < r2 ,V b, (8)<br />

p2r p00<br />

and subject to I I I'I(r,cb)I2rdrdb < a<br />

Jo Jo<br />

(9)<br />

where A(r, ) is the AF of the signal in polar coordinates, and the kernel (r, b) is assumed to be real and positive.<br />

The per<strong>for</strong>mance measure (7) expresses our desire to pass as much auto—component energy as possible into the TFD<br />

<strong>for</strong> a kernel of fixed volume a, . The second constraint (8) <strong>for</strong>ces the kernel to be radially nonincreasing. Since the<br />

AF auto—components are centered at the origin, this encourages the kernel to preferentially pass auto-components.<br />

The final constraint (9) restricts the size of the kernel so that cross-components are suppressed. An advantage of this<br />

<strong>for</strong>mulation is that the constraints are insensitive to both the <strong>time</strong>-scale and orientation of the signal in <strong>time</strong>-<strong>frequency</strong>.<br />

2.2 Discrete optimization <strong>for</strong>mulation<br />

In practice, TFD's are computed at discrete <strong>time</strong> and <strong>frequency</strong> locations, so we re<strong>for</strong>mulate the optimization problem<br />

by discretizing equations (7)—(9). With suitably dense sampling, the discrete <strong>for</strong>mulation converges to the continuous<br />

<strong>for</strong>mulation. Per<strong>for</strong>ming the discretization, we define an optimal discrete kernel to be one that satisfies:<br />

fAd(rn, n)'1d(m, n)12, (10)<br />

nx ;:<br />

subject to d(O,O) = 1 and<br />

Jd(m, n) is radially nonincreasing, (11)<br />

and subject to :i Id(m, n)f2 ad (12)<br />

where Ad(m, n) is the N x N discrete AF of the signal to be analyzed, and the kernel d(rn, n) is assumed to bereal<br />

and positive. Note that since the AF is conjugate symmetric through the origin<br />

the optimal kernel can be computed from a half-plane of AF samples.<br />

Ad(rn,fl) = A(—m,_n), (13)<br />

The constraint that the kernel be radially nonincreasing can be implemented exactly only on a polar grid of samples.<br />

However, computing the AF and resulting TFD on a polar grid requires either the computation of a polar Fourier<br />

trans<strong>for</strong>m, <strong>for</strong> which no fast algorithm exists, or a costly interpolation from a rectangular grid. There<strong>for</strong>e, we approximate<br />

the polar grid by a set of paths on a rectangular grid. Figure 5 illustrates a tree structure that approximates the<br />

SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 183<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms


adial dependencies of the kernel <strong>for</strong> the upper half-plane of a 64x64 rectangular grid. The nonincreasing constraint<br />

is en<strong>for</strong>ced along each path from the origin to the edge. The branches of the tree are constructed to minimize the<br />

maximum deviation from the branch to the true radial line.<br />

3. OPTIMAL KERNEL SOLUTION<br />

Since the per<strong>for</strong>mance measure and constraints are linear in IdI2, the optimal kernel may be found by applying linear<br />

programming'2 to solve <strong>for</strong> the N2 unknowns 1'd(m, n) (since 1d is assumed to be real and positive, knowing NdI2is<br />

equivalent to knowing d). Moreover, it can be shown that the optimal kernel takes on essentially only the values of<br />

one and zero.<br />

The optimal TFD can thus be determined as follows. First, the discrete AF of the signal to be analyzed is computed.<br />

Next, the linear program (1O)—(12) is solved <strong>for</strong> the optimal kernel, which is then multiplied by the AF. The twodimensional<br />

Fourier trans<strong>for</strong>m of the product is the optimal TFD.<br />

3.1 Fast Algorithm <strong>for</strong> Solution<br />

A solution <strong>for</strong> the optimal kernel using standard linear programming methods may be simple, but it is also computationally<br />

expensive. Use of the simplex algorithm would cause the optimal kernel computation to dominate the total<br />

cost of computing the optimal TFD. However, we have found an extremely efficient inductive procedure that computes<br />

the optimal kernel with O[N2] operations to find the optimal kernel of size a. Since this number is small in comparison<br />

to the O[N2 log N] computations required to find the AF or WD, <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong> with signal-dependent<br />

<strong>kernels</strong> is competitive computationally with traditional fixed kernel methods.<br />

3.2 Implementation Issues<br />

Although the one-zero kernel is optimal according to the constraints stated in section 2.2, its sharp cutoff may introduce<br />

ringing in the optimal TFD. Thus, some <strong>for</strong>m ofsmoothing may be desired. One simple approach, used in the examples,<br />

tapers the kernel.<br />

Adjustment of the parameter cd controls the tradeoff between cross-component suppression and smearing of the<br />

auto-components. A lower bound on reasonable values <strong>for</strong> a can be derived from uncertainty principle arguments.<br />

4. EXAMPLES<br />

In order to compare the results of the optimal kernel design procedure with other TFD's, the optimal kernel was<br />

computed <strong>for</strong> the same signal discussed in the introduction. The AF and kernel were of size 64 x 64, the parameter<br />

ad was set to 30 and tapering was applied. The AF, optimal kernel and resulting TFD are shown in Figs. 6,7 and 8.<br />

The cross-component visible in all of the other TFD's except the matched-filter spectrogram (see Fig. 4) is virtually<br />

eliminated, yet the distribution is still quite concentrated — much more so than the matched-filter spectrogram.<br />

Figure 9 illustrates the WD of the same signal corrupted by additive white Gaussian noise. The SNR of the resulting<br />

signal is 0dB. The optimal kernel was computed using the same parameters as above. The cross-component and noise<br />

suppression of the optimal TFD, shown in Fig. 10, are excellent, indicating that the kernel design procedure is robust<br />

in the presence of significant additive noise.<br />

5. CONCLUSION<br />

An optimization procedure has been presented <strong>for</strong> the automatic determination of signal-dependent smoothing <strong>kernels</strong><br />

<strong>for</strong> <strong>time</strong>-<strong>frequency</strong> <strong>analysis</strong>. Due to the signal-dependent nature of the kernel, the quality of the resulting <strong>time</strong><strong>frequency</strong><br />

representation is insensitive to the <strong>time</strong>-scale and orientation of the signal. A fast algorithm <strong>for</strong> the optimal<br />

kernel solution makes the method competitive computationally with traditional fixed kernel methods. The procedure<br />

appears to yield excellent results <strong>for</strong> a much larger class of signals than any fixed kernel representation. The technique<br />

per<strong>for</strong>ms well even in the presence of substantial additive noise.<br />

184 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990)<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms


6. ACKNOWLEDGMENTS<br />

This work was supported by the Music Group of the Computer-based Education Research Laboratory and the Joint<br />

Services Electronics Program, Grant No. N00014-90-J--1270.<br />

7. REFERENCES<br />

1. T.A.C.M. Claasen and W.F.G. Mecklenbräuker, "The Wigner Distribution — A Tool <strong>for</strong> Time—Frequency Signal<br />

Analysis — Part I: Continuous—Time Signals," Philips Journal ofResearch 35(3), pp. 217-250, 1980.<br />

2. L. Cohen, "Time—Frequency Distributions —A Review," Proceedings of the IEEE 77(7), pp. 941—981, July 1989.<br />

3. P. Flandrin, 0. Rioul, "Affine Smoothing of the Wigner-Ville Distribution," IEEE ICASSP-1990, pp. 2455—2458,<br />

1990.<br />

4. 11.-I. Choi and W.J. Williams, "Improved Time-Frequency Representation of Multicomponent Signals Using Exponential<br />

Kernels," IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-37(6), pp. 862—871,<br />

June 1989.<br />

5. P.Flandrin, "Some Features of Time-Frequency Representations of Multicomponent Signals," IEEE IGASSP-1 984,<br />

pp. 41.B.4.1—41.B.4.4, 1984.<br />

6. S.C. Glinski, "Diphone Speech Synthesis Based on a Pitch-Adaptive Short-Time Fourier Trans<strong>for</strong>m," Ph.D Thesis,<br />

<strong>University</strong> of Illinois, Urbana, 1981.<br />

7. G. Jones and B. Boashash, "Instantaneous Frequency, Instantaneous Bandwidth and the Analysis of Multicomponent<br />

Signals," IEEE IGASSP-1990, pp. 2467—2470, 1990.<br />

8. J.C. Andrieux, M.R. Felix, G. Mourgues, P. Bertrand, B. Izrar and V.T. Nguyen, "Optimum Smoothing of the<br />

Wigner—Ville Distribution," IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP—35(6), pp.<br />

764—769, June 1987.<br />

9. 5. Kadambe, G.F. Boudreaux-Bartels and P. Duvaut, "Window Length Selection <strong>for</strong> Smoothing the Wigner Distribution<br />

by Applying an Adaptive Filter Technique," IEEE ICASSP-1989, pp. 2226—2229, 1989.<br />

10. A.H. Nuttall. "Wigner Distribution Function: Relation to Short-Term Spectral Estimation, Smoothing, and Per<strong>for</strong>mance<br />

in Noise," NUSC Technical Report 8225, February 16, 1988.<br />

11. D.L. Jones and T.W. Parks, "A High Resolution Data-Adaptive Time-Frequency Representation," IEEE ICASSP-<br />

87, Dallas TX, pp. 681—684, April 1987.<br />

12. D.G. Luenberger, Introduction to Linear and Nonlinear Programming, Addison-Wesley Co., Reading, MA, 1973.<br />

SPIE Vol. 1348 Advanced Signal-Processing Algorithms, Architectures, and Implementations (1990) / 185<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms


Fig.1. Magnitude of the Wigner distribution.<br />

Fig. 3. Magnitude of the spectrogram computed<br />

with a Hamming window of duration equal to<br />

the effective <strong>time</strong> width of the signal.<br />

Fig. 2. Magnitude of the Choi-Williams disthbution<br />

computed with smoothing parameter a =20.<br />

Fig. 4. Magnitude of the matched-filter spectrogram.<br />

Fig. 5. Approximation of a radial dependency graph'on a rectangular grid.<br />

Architectures, and Implementations (1990)<br />

186 / SPIE Vol. 1348 Advanced Signal-Processing Algorithms,<br />

Downloaded from SPIE <strong>Digital</strong> Library on 18 Jan 2010 to 128.42.157.67. Terms of Use: http://spiedl.org/terms


Fig. 6. Magnitude of the ambiguity function. Fig. 7. The tapered optimal kernel, a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!