18.07.2013 Views

Overview of Lecture STFA and STFS Definition of STFT Short Time ...

Overview of Lecture STFA and STFS Definition of STFT Short Time ...

Overview of Lecture STFA and STFS Definition of STFT Short Time ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Digital Signal Processing<br />

Design g — <strong>Lecture</strong> 6<br />

<strong>Short</strong> <strong>Short</strong>-<strong>Time</strong> <strong>Time</strong> Fourier<br />

Analysis Methods<br />

<strong>Overview</strong> <strong>of</strong> <strong>Lecture</strong><br />

• define time time-varying varying Fourier transform<br />

(<strong>STFT</strong>) analysis method<br />

• define synthesis method from timevarying<br />

y g FT ( (filter-bank summation, , overlap p<br />

addition)<br />

• show how time-varying FT can be viewed<br />

in terms <strong>of</strong> a bank <strong>of</strong> filters model<br />

• computation methods based on using<br />

FFT<br />

<strong>Definition</strong> <strong>of</strong> <strong>STFT</strong><br />

∞<br />

∑<br />

jω − jωm X ˆ ˆ<br />

nˆ<br />

( e ) = x( m) w( n−m) e both n <strong>and</strong> ω are variables<br />

m=−∞<br />

• wn ( ˆ −m)<br />

is a real window which determines the portion <strong>of</strong> xn ( ˆ)<br />

jω<br />

that is used in the computation <strong>of</strong> X ( e )<br />

nˆ<br />

1<br />

3<br />

5<br />

x[ n]<br />

General Discrete Discrete-<strong>Time</strong> <strong>Time</strong> Model <strong>of</strong><br />

Speech Production<br />

<strong>STFA</strong><br />

Speech<br />

Analysis<br />

Voiced Speech:<br />

• AVP(z)G(z)V(z)R(z) P(z)G(z)V(z)R(z)<br />

Unvoiced Speech:<br />

• ANN(z)V(z)R(z) N(z)V(z)R(z)<br />

<strong>STFA</strong> <strong>and</strong> <strong>STFS</strong><br />

X e<br />

j ˆ ω<br />

n(<br />

)<br />

Signal<br />

Processing<br />

n<br />

j ˆ<br />

Side Info<br />

(Pitch, V/U, …)<br />

Y ( e )<br />

ω<br />

<strong>STFS</strong><br />

<strong>Short</strong> <strong>Time</strong> Fourier Transform<br />

• <strong>STFT</strong> is a function <strong>of</strong> two variables, the time index, nˆ<br />

, which<br />

is discrete, <strong>and</strong> the frequency variable, ω,<br />

which is continuous<br />

nˆ<br />

∞<br />

jω = ∑<br />

m=−∞<br />

ˆ −<br />

− jωm X ( e ) x( m) w( n m) e<br />

= DTFT ( xmwn ( ) ( ˆ −m)) ⇒ nˆ<br />

fixed, ω variable<br />

nˆ<br />

2<br />

yn [ ]<br />

4<br />

6<br />

1


Interpretations <strong>of</strong> <strong>STFT</strong><br />

jω<br />

• there are 2 distinct interpretations <strong>of</strong> X ( e )<br />

jω<br />

1.<br />

assume nˆis fixed, then Xnˆ( e ) is simply the normal Fourier<br />

transform <strong>of</strong> the sequence wn ( ˆ −mxm ) ( ), −∞ < m<<br />

∞ => for<br />

jω<br />

fixed nˆ, X ˆ ( e ) has the same<br />

properties as a normal Fourier<br />

n<br />

transform a s o<br />

j ˆ ω<br />

2. consider Xn( e ) as a function <strong>of</strong> the time index n with ˆ ω fixed.<br />

j ˆ ω − j ˆ ωn<br />

Then Xn( e ) is in the form <strong>of</strong> a convolution <strong>of</strong> the signal x( n) e<br />

with the window<br />

wn ( ). This leads to an interpretation in the form <strong>of</strong><br />

− j ˆ ωn<br />

linear filtering <strong>of</strong> the frequency modulated signal xne ( ) by wn ( ).<br />

- we will now consider each <strong>of</strong> these interpretations <strong>of</strong> the <strong>STFT</strong> in<br />

a lot more detail<br />

Signal Recovery from <strong>STFT</strong><br />

jω<br />

• since for a given value <strong>of</strong> nX ˆ,<br />

nˆ<br />

( e ) has the same properties as a<br />

normal Fourier transform, we can recover the input sequence exactly<br />

jω<br />

• since Xnˆ( e ) is the normal Fourier transform <strong>of</strong> the windowed<br />

sequence wn ( ˆ − mxm ) ( ), then<br />

π<br />

1<br />

j jω j jωm wn ( ˆ − mxm ) ( ) =<br />

2 ∫ ˆ ( ) ω<br />

π ∫ Xn e e d<br />

−π<br />

• assuming the window satisfies the property that w(<br />

0) ≠ 0 ( a trivial<br />

requirement), then by evaluating the inverse Fourier transform<br />

when m = nˆ,<br />

we obtain<br />

π<br />

1<br />

jω jωnˆ xn ( ˆ)<br />

= ˆ(<br />

) ω<br />

2 ( 0)<br />

∫ Xn e e d<br />

w<br />

π −π<br />

Alternative Forms <strong>of</strong> <strong>STFT</strong><br />

jω<br />

Alternative forms <strong>of</strong> X ( e )<br />

n<br />

1.<br />

real <strong>and</strong> imaginary parts<br />

jω jω jω<br />

X = ⎡<br />

⎣<br />

⎤<br />

⎦<br />

+ ⎡<br />

⎣<br />

⎤<br />

n( e ) Re Xn( e ) jIm Xn( e )<br />

⎦<br />

= an( ω) − jbn(<br />

ω)<br />

jω<br />

a ( ω)<br />

= Re ⎣<br />

⎡ ( ) ⎦<br />

⎤<br />

n n( ) ⎣X n n(<br />

e ) ⎦<br />

jω<br />

b ( ω)<br />

=−Im ⎡<br />

⎣ ( ) ⎤<br />

n Xn e ⎦<br />

• when xm ( ) <strong>and</strong> wn ( −m)<br />

are both real (usually the case)<br />

can show that an( ω) is symmetric in ω, <strong>and</strong> bn(<br />

ω)<br />

is<br />

anti-symmetric in ω<br />

2. magnitude <strong>and</strong> phase<br />

jω jω<br />

jθn(<br />

ω)<br />

Xn( e ) = | Xn( e ) | e<br />

jω<br />

• can relate | X ( e ) | <strong>and</strong> θ ( ω) to a ( ω) <strong>and</strong> b ( ω)<br />

nˆ<br />

n n n n<br />

7<br />

9<br />

11<br />

Frequencies for <strong>STFT</strong><br />

• the <strong>STFT</strong> is periodic in ω with period 2 π,<br />

i.e.,<br />

jω j( ω+ 2πk)<br />

Xnˆ( e ) = Xnˆ( e ), ∀k<br />

• can use any <strong>of</strong> several frequency variables to express <strong>STFT</strong>,<br />

including<br />

ω =ΩT (where T is the sampling period for x( m))<br />

to represent<br />

jΩT analog radian frequency, giving Xn( e )<br />

ω = 2πf or ω = 2πFT to represent normalized frequency (0 ≤f ≤1)<br />

j2πf j2πFT or analog frequency (0 ≤F ≤ F = / T), giving X ( e ) or X ( e )<br />

s 1 nˆ nˆ<br />

Signal Recovery from <strong>STFT</strong><br />

π<br />

1<br />

jω jωnˆ xn ( ˆ)<br />

=<br />

ˆ ( ) ω<br />

2π( 0)<br />

∫ Xn e e d<br />

w −π<br />

• with the requirement that w( 0) ≠ 0,<br />

the sequence x( nˆ)<br />

jω jω<br />

can be recovered exactly from Xnˆ( e ), if Xnˆ( e ) is known<br />

for all values <strong>of</strong> ω over one complete period<br />

- sample-by-sample recovery process<br />

jω<br />

- X ˆ<br />

nˆ<br />

( e ) must be known for every value <strong>of</strong> n <strong>and</strong> for all ω<br />

i can also recover sequence wn ( ˆ − mxm ) ( ) but can't guarantee<br />

that xm ( ) can be recovered since wn ( ˆ − m) can<br />

equal 0<br />

Windows in <strong>STFT</strong><br />

jω<br />

• for Xn( e ) to represent the short-time spectral properties <strong>of</strong> x( n)<br />

jθ<br />

inside the window => We ( ) should be much narrower in frequency<br />

jω<br />

than significant spectral regions <strong>of</strong> Xe ( )--i.e., almost an impulse<br />

in frequency<br />

• consider rectangular g <strong>and</strong> Hamming g windows, where width <strong>of</strong> the<br />

main spectral lobe is inversely proportional to window length, <strong>and</strong> side<br />

lobe levels are essentially independent <strong>of</strong> window length<br />

Rectangular Window: flat window <strong>of</strong> length L samples; first<br />

zero in frequency response occurs at FS/L, with sidelobe levels<br />

<strong>of</strong> -14 dB or lower<br />

Hamming Window: raised cosine window <strong>of</strong> length L<br />

samples; first zero in frequency response occurs at 2FS/L, with<br />

sidelobe levels <strong>of</strong> -40 dB or lower<br />

8<br />

10<br />

12<br />

2


<strong>Time</strong> <strong>and</strong> Frequency Responses <strong>of</strong><br />

L=2M+1 Point Hamming Window<br />

Effect <strong>of</strong> Window Length Length-HW HW<br />

Effect <strong>of</strong> Window Length Length-HW HW<br />

13<br />

15<br />

17<br />

Effect <strong>of</strong> Window Length<br />

Effect <strong>of</strong> Window Length Length-RW RW<br />

Linear Filtering Interpretation<br />

1.<br />

modulation-lowpass filter form<br />

n(<br />

∞<br />

j ˆ ω<br />

) = ∑<br />

m=−∞<br />

( )<br />

− j ˆ ωm<br />

( − )<br />

X e x m e w n m<br />

− j ˆ ωn<br />

( )<br />

π<br />

∫<br />

= wn ( ) ∗ xne ( ) , nvariable,<br />

ˆ ω fixed<br />

1<br />

j jθθ j ( θθ + ˆ ωω ) j jθθ n<br />

= We ( ) Xe ( ) e d dθθ<br />

2π<br />

−π<br />

2.<br />

b<strong>and</strong>pass filter-demodulation<br />

n(<br />

∞<br />

j ˆ ω<br />

) = ∑<br />

m=−∞<br />

( ) ( − )<br />

−j ˆ ω(<br />

n−m) =<br />

∞<br />

− j ˆ ωn ∑<br />

m=−∞<br />

j ˆ ωm<br />

−<br />

− j ˆ ωn j ˆ ωn<br />

X e w m x n m e<br />

e ( w( m) e ) x( n m)<br />

= e [( w( n) e ) ∗x(<br />

n)], n<br />

variable, ˆ ω fixed<br />

14<br />

16<br />

18<br />

3


Linear Filtering Interpretation<br />

1. modulation-lowpass filter form:<br />

∞<br />

j ˆ ω<br />

n(<br />

) = ∑<br />

m=−∞<br />

(<br />

− j ˆ ωm<br />

) ( − ), variable, ˆ ω fixed<br />

X e x m e w n m n<br />

− jωn ( xne ( ) ) wn ( )<br />

( ˆ ω ) ( ˆ ω )<br />

= ∗<br />

= xn ( )cos( n) ∗wn ( ) − j xn ( )sin( n) ∗wn<br />

( )<br />

= a ( ˆ ω) − jb ( ˆ ω)<br />

n n<br />

Linear Filtering Interpretation<br />

Sampling Rate in <strong>Time</strong><br />

• to determine the sampling rate in time, we take a linear filtering view<br />

j ˆ ω<br />

1. Xn( e ) is the output <strong>of</strong> a filter with impulse response w( n)<br />

jω<br />

2. We ( ) is a lowpass response with effective b<strong>and</strong>width<br />

<strong>of</strong> B Hertz<br />

j ˆ ω j ˆ ω<br />

• thus the effective b<strong>and</strong>width <strong>of</strong> Xn( e ) is B Hertz => Xn( e ) has<br />

to be sampled at a rate <strong>of</strong> 2B<br />

samples/second to avoid aliasing<br />

Example: Hamming Window ( L = 100 samples; L = 400 samples)<br />

2Fs<br />

⇒B ≈ (Hz); for L = 100, Fs= 10, 000 Hz => B = 200 Hz => need<br />

L<br />

rate <strong>of</strong> 400/sec (every R = 25 samples) for sampling rate in time (wideb<strong>and</strong> analysis)<br />

⇒ for L = 400, FS= 10, 000 Hz ⇒ B = 50 Hz ⇒ need rate <strong>of</strong> R = 100/sec<br />

(every 100 samples) for sampling rate in time (narrowb<strong>and</strong> analysis)<br />

19<br />

21<br />

23<br />

Linear Filtering Interpretation<br />

2. b<strong>and</strong>pass filter-demodulation form<br />

X e<br />

−<br />

e ⎡<br />

⎣( w n e ) x n ⎤<br />

⎦<br />

n<br />

j ˆ ω j ˆ ωn j ˆ ωn<br />

n(<br />

) = ( ) ∗ ( ) , variable, ω fixed<br />

• complex b<strong>and</strong>pass filter output<br />

− jωn modulated by signal e<br />

jθ<br />

• if We ( ) is lowpass, then filter<br />

is b<strong>and</strong>pass around θ = ω<br />

• all real computation for lower<br />

half structure<br />

Sampling Rates <strong>of</strong> <strong>STFT</strong><br />

• need to sample <strong>STFT</strong> in both time <strong>and</strong> frequency to<br />

produce an unaliased representation from which x(n) can<br />

be exactly recovered<br />

• sampling rates lower than the theoretical minimum rate<br />

can be used, in either time or frequency, <strong>and</strong> x(n) can<br />

still till bbe exactly tl recovered dffrom th the aliased li d( (under- d<br />

sampled) short-time transform<br />

– this is useful for spectral estimation, pitch estimation, formant<br />

estimation, speech spectrograms, vocoders<br />

– for applications where the signal is modified, e.g., speech<br />

enhancement, cannot undersample <strong>STFT</strong> <strong>and</strong> still recover<br />

modified signal exactly<br />

Sampling Rate in Frequency<br />

jω<br />

• since Xnˆ( e ) is periodic in ω with period 2π,<br />

it is only necessary to sample over an<br />

interval <strong>of</strong> length 2π<br />

• need to determine an appropriate finite set <strong>of</strong> frequencies, ωk= 2π k / N, k = 01 , ,..., N −1<br />

jω<br />

at which Xnˆ( e ) must be specified to exactly recover x( n)<br />

jω<br />

• use the Fourier transform interpretation <strong>of</strong> Xnˆ( e )<br />

jω<br />

1. if the window wn ( ) is time-limited, then the inverse transform <strong>of</strong> Xnˆ( e ) is time-limited<br />

j jωω<br />

22. the sampling theorem requres that we sample X Xnˆ ( e ) in the frequency dimension ata at a<br />

rate <strong>of</strong> at least twice its ('symmetric') "time width"<br />

jω<br />

3. since the inverse Fourier transform <strong>of</strong> X ˆ<br />

ˆ ( e ) is the signal x( m) w( n−m) <strong>and</strong> this signal<br />

n<br />

is <strong>of</strong> duration L samples (the duration <strong>of</strong> w( n)),<br />

then according to the sampling theorem<br />

jω<br />

Xnˆ( e ) must be sampled (in frequency) at the set <strong>of</strong> frequencies<br />

2π<br />

k<br />

ω k = , k = 01 , ,..., L−1(where L/<br />

2 is the effective width <strong>of</strong> the window)<br />

L<br />

jωk<br />

in order to exactly recover xn ( ) from X ˆ ( e ) (see Prob. 6.8)<br />

n<br />

• thus for a Hamming window <strong>of</strong> duration L=100 samples, we require that the <strong>STFT</strong> be<br />

evaluated at at least 100 uniformly spaced frequencies around the unit circle<br />

ˆ<br />

20<br />

22<br />

24<br />

4


“Total” Sampling Rate <strong>of</strong> <strong>STFT</strong><br />

• the “total” sampling rate for the <strong>STFT</strong> is the product <strong>of</strong> the sampling<br />

rates in time <strong>and</strong> frequency, i.e.,<br />

SR = SR(time) x SR(frequency)<br />

= 2B x L samples/sec<br />

B = frequency b<strong>and</strong>width <strong>of</strong> window (Hz)<br />

L = time width <strong>of</strong> window (samples)<br />

• for most windows <strong>of</strong> interest, , B is a multiple p <strong>of</strong> FS/L, S , i.e., ,<br />

B = C FS/L (Hz), C=1 for Rectangular Window<br />

C=2 for Hamming Window<br />

SR = 2C FS samples/second<br />

• can define an ‘oversampling rate’ <strong>of</strong><br />

SR/ FS = 2C = oversampling rate <strong>of</strong> <strong>STFT</strong> as compared to<br />

conventional sampling representation <strong>of</strong> x(n)<br />

for RW, 2C=2; for HW 2C=4 => range <strong>of</strong> oversampling is 2-4<br />

this oversampling gives a very flexible representation <strong>of</strong> the speech signal<br />

Spectrographic Displays<br />

• Sound Spectrograph-one <strong>of</strong> the earliest embodiments <strong>of</strong> the timedependent<br />

spectrum analysis techniques<br />

– 2-second utterance repeatedly modulates a variable frequency<br />

oscillator, then b<strong>and</strong>pass filtered, <strong>and</strong> the average energy at a given<br />

time <strong>and</strong> frequency is measured <strong>and</strong> used as a crude measure <strong>of</strong> the<br />

<strong>STFT</strong><br />

– thus energy is recorded by an ingenious electro-mechanical system on<br />

special electrostatic paper called teledeltos paper<br />

– result is a two-dimensional representation <strong>of</strong> the time-dependent<br />

spectrum-with vertical intensity being spectrum level at a given<br />

frequency, <strong>and</strong> horizontal intensity being spectral level at a given timewith<br />

spectrum magnitude being represented by the darkness <strong>of</strong> the<br />

marking<br />

– wide b<strong>and</strong>pass filters (300 Hz b<strong>and</strong>width) provide good temporal<br />

resolution <strong>and</strong> poor frequency resolution (resolve pitch pulses in time<br />

but not in frequency)—called wideb<strong>and</strong> spectrogram<br />

– narrow b<strong>and</strong>pass filters (45 Hz b<strong>and</strong>width) provide good frequency<br />

resolution <strong>and</strong> poor time resolution (resolve pitch pulses in frequency,<br />

but not in time)—called narrowb<strong>and</strong> spectrogram<br />

27<br />

Sequence <strong>of</strong> Spectrograms <strong>of</strong> Utterance “This is a test”<br />

3 msec (48 sample)<br />

analysis window<br />

25<br />

6 msec (96 sample)<br />

analysis window<br />

9 msec (144 sample)<br />

analysis window<br />

30 msec (480<br />

sample) analysis<br />

window<br />

29<br />

Sampling the <strong>STFT</strong><br />

Speech Spectrograms<br />

• wideb<strong>and</strong> spectrogram<br />

• follows broad spectral peaks (formants)<br />

over time<br />

• resolves most individual pitch periods as<br />

vertical striations since the IR <strong>of</strong> the<br />

analyzing filter is comparable in duration<br />

to a pitch period<br />

• what happens for low pitch males—high<br />

pitch females<br />

• for unvoiced speech there are no vertical<br />

pitch striations<br />

• narrowb<strong>and</strong> spectrogram<br />

• individual harmonics are resolved in<br />

voiced regions<br />

• formant frequencies are still in evidence<br />

• usually can see fundamental frequency<br />

• unvoiced regions show no strong<br />

structure<br />

28<br />

Spectrogram Comparisons<br />

26<br />

30<br />

5


Wideb<strong>and</strong> Spectrogram - Male<br />

nfft = 1024, L = 80 / 800, Overlap = 75 / 790<br />

Overlap Addition (OLA) Method<br />

• based on normal FT interpretation <strong>of</strong> short-time spectrum<br />

jωk<br />

DFT / IDFT<br />

X ˆ<br />

nˆ( e ) ←⎯⎯⎯⎯→ ynˆ( m) = x( m) w( n−m) jωk<br />

• can reconstruct x(m) by computing IDFT <strong>of</strong> Xnˆ ( e ) <strong>and</strong><br />

dividing out the window (assumed non-zero non zero for all samples)<br />

• this process gives L signal values <strong>of</strong> x(m) for each window =><br />

window can be moved by L samples <strong>and</strong> the process repeated<br />

jωk<br />

• since Xnˆ( e ) is "undersampled" in time, it is highly susceptible<br />

to aliasing errors => need more robust synthesis procedure<br />

Overlap Addition <strong>of</strong> Bartlett <strong>and</strong> Hann<br />

Windows<br />

31<br />

33<br />

35<br />

Wideb<strong>and</strong> Spectrogram - Female<br />

nfft = 1024, L = 80 / 800, Overlap = 75 / 790<br />

Overlap Addition (OLA) Method<br />

⎡ jω ω ⎤<br />

k j kn<br />

yn ( ) = ∑∑ ⎢ Xm( e ) e ⎥<br />

m ⎣ k<br />

⎦<br />

• summation is for overlapping analysis sections<br />

jωk<br />

• for each value <strong>of</strong> m where Xm( e ) is measured, do an inverse FT to give<br />

ym( n) = Lx( n) w( m−n) (where L is the size <strong>of</strong> the FT)<br />

y ( n ) = y ( n ) = Lx ( n ) w ( m m− n )<br />

∑ m ∑<br />

m m<br />

• a basic property <strong>of</strong> the window is<br />

N−1<br />

j 0<br />

jωk<br />

We ( ) = We ( )| ω = 0 = ∑wn<br />

( )<br />

k<br />

n=<br />

0<br />

• since any set <strong>of</strong> samples <strong>of</strong> the window are equivalent (by sampling arguments),<br />

then if wn ( ) is sampled <strong>of</strong>ten<br />

enough we get (independent <strong>of</strong> n)<br />

j 0<br />

∑wm<br />

( − n) = We ( )<br />

m<br />

j 0<br />

yn ( ) = LxnWe ( ) ( ) using overlap-added sections<br />

Overlap Addition <strong>of</strong> Hamming Window<br />

L=128<br />

32<br />

34<br />

36<br />

6


Overlap Addition (OLA) Method<br />

Filter Bank Summation<br />

• the filter bank interpretation <strong>of</strong> the <strong>STFT</strong> shows that for<br />

jωk<br />

any frequency ωk,<br />

Xn( e ) is a lowpass representation<br />

<strong>of</strong> the signal in a b<strong>and</strong> centered at ω<br />

n( ∞<br />

jωk − jωkn ) = ∑<br />

m=−∞<br />

( − )<br />

k<br />

k(<br />

)<br />

jωkm X e e x n m w m e<br />

where wk( m)<br />

is the lowpass window used at frequency ωk<br />

(we have generalized the structure to allow a different<br />

lowpass window at each frequency ω ).<br />

Filter Bank Summation<br />

jωk<br />

• thus Xn( e ) is obtained by b<strong>and</strong>pass filtering x( n)<br />

followed by modulation with the complex exponential<br />

− jωkn e . We can express this in the form<br />

jωk jωkn y ( n) = X ( e ) e = x( n−m) h ( m)<br />

k n k<br />

m=−∞<br />

• thus yk( n)<br />

is the output <strong>of</strong> a b<strong>and</strong>pass filter with impulse<br />

response h ( n)<br />

k<br />

∞<br />

∑<br />

k<br />

37<br />

39<br />

41<br />

Overlap Addition (OLA) Method<br />

• 4-overlapping sections<br />

contribute to each interval<br />

• N-point FFT’s done using<br />

L speech p samples, p , with N-L<br />

zeros padded at end to<br />

allow modifications without<br />

significant aliasing effects<br />

• for a given value <strong>of</strong> n<br />

y(n)=x(n)w(R-n)+x(n)w(2Rn)+x(n)w(3R-n)+x(n)w(4Rn)=x(n)[w(R-n)+w(2Rn)+w(3R-n)+w(4R-n)]=x(n)<br />

W(ej0 )/R<br />

Filter Bank Summation<br />

• define a b<strong>and</strong>pass filter <strong>and</strong> substitute it in the<br />

equation to give<br />

ωk<br />

h ( n) = w ( n) e<br />

k k<br />

j n<br />

jωk − jωkn X ( e ) = e x( n−m) h ( m)<br />

n k<br />

m=−∞<br />

∞<br />

∑<br />

Filter Bank Summation<br />

38<br />

40<br />

42<br />

7


Filter Bank Summation<br />

Filter Bank Summation<br />

• consider a set <strong>of</strong> N b<strong>and</strong>pass filters, uniformly spaced, so that the entire<br />

frequency b<strong>and</strong> is covered<br />

2π<br />

k<br />

ωk<br />

= , k = 01 , ,..., N −1<br />

N<br />

• also assume window the same for all channels, i.e.,<br />

w wk ( n ) = w ( n n)<br />

), k = 01 , ,..., N − 1<br />

• if we add together all the b<strong>and</strong>pass outputs, the composite response is<br />

N−1 N−1<br />

jω jω<br />

j(<br />

ω−ωk) He ( ) = ∑Hk( e ) = ∑We<br />

( )<br />

k= 0 k=<br />

0<br />

jωk<br />

• if We ( ) is properly sampled in frequency ( N≥ L), where Lis<br />

the<br />

window duration, then it can be shown that<br />

N−1<br />

1<br />

j(<br />

ω−ωk) ∑We<br />

( ) = w(<br />

0)<br />

∀ω<br />

N<br />

FBS Formula<br />

k = 0<br />

45<br />

Filter Bank Summation<br />

• derivation <strong>of</strong> FBS formula<br />

FT / IFT jω<br />

wn ( ) ←⎯⎯⎯→We<br />

( )<br />

jω<br />

• if We ( ) is sampled in frequency at Nuniformly<br />

spaced p p points, , the inverse discrete Fourier transform<br />

jωk<br />

<strong>of</strong> the sampled version <strong>of</strong> We ( ) is (recall that<br />

sampling ⇒ multiplication ⇔ convolution ⇒ aliasing)<br />

N−1<br />

∑<br />

jωk jωkn ∞<br />

∑<br />

k= 0<br />

r=−∞<br />

1<br />

N<br />

We ( ) e = wn ( + rN)<br />

• an aliased version <strong>of</strong> wn ( ) is obtained.<br />

43<br />

47<br />

Filter Bank Summation<br />

• a practical method for reconstructing xn ( ) from the <strong>STFT</strong> is as follows<br />

jωk<br />

1. assume we know Xn( e ) for a set <strong>of</strong> N frequencies { ωk},<br />

k = 01 , ,..., N −1<br />

2. assume we have a set <strong>of</strong> N b<strong>and</strong>pass filters with impulse responses<br />

jωkn hk( n) = wk( n) e , k = 01 , ,..., N −1<br />

3. assume wk( n)<br />

is an ideal lowpass filter with cut<strong>of</strong>f frequency ωpk<br />

- the frequency response <strong>of</strong> the b<strong>and</strong>pass filter is<br />

jω<br />

j(<br />

ω−ωk) Hk( e ) = Wk( e )<br />

Filter Bank Summation<br />

Filter Bank Summation<br />

• If wn ( ) is <strong>of</strong> duration Lsamples,<br />

then<br />

wn ( ) = 0, n< 0,<br />

n≥ L<br />

• <strong>and</strong> no aliasing occurs due to sampling in frequency<br />

jω<br />

<strong>of</strong> We ( ). In this case if we evaluate the aliased<br />

formula for n = 0 0,<br />

we get<br />

N−1<br />

1<br />

jωk<br />

∑We<br />

( ) = w(<br />

0)<br />

N<br />

k = 0<br />

• the FBS formula is seen to be equivalent to the formula<br />

above, since (according to the sampling theorem) any<br />

jω<br />

set <strong>of</strong> N uniformly spaced samples <strong>of</strong> W( e<br />

) is adequate<br />

44<br />

46<br />

48<br />

8


Filter Bank Summation<br />

• the impulse response <strong>of</strong> the composite filter bank system is<br />

N−1 N−1<br />

<br />

jωkn hn ( ) = ∑hk( n) = ∑wne<br />

( ) = Nw( 0)<br />

δ ( n)<br />

k= 0 k=<br />

0<br />

• thus the composite output is<br />

yn ( ) = xn ( ) ∗ h h ( n ) = N w ( 0 ) x ( n )<br />

• thus for FBS method, the reconstructed signal is<br />

N−1 N−1<br />

jωk jωkn yn ( ) = ∑yk( n) = ∑Xn( e ) e = Nw( 0)<br />

xn ( )<br />

k= 0 k=<br />

0<br />

jωk<br />

• if Xn( e ) is sampled properly in frequency, <strong>and</strong> is independent <strong>of</strong> the<br />

shape <strong>of</strong> wn ( )<br />

Filter Bank Summation<br />

N−1 = ∑ k<br />

N−1<br />

= ∑ n<br />

2π j k<br />

N<br />

2π<br />

j kn<br />

N<br />

k= 0 k=<br />

0<br />

N−1<br />

2π 2π<br />

⎡ − j km⎤ j kn<br />

N N<br />

= ∑∑ ⎢ xmwn ( ) ( −me<br />

) ⎥ e<br />

k= 0 ⎣ m<br />

⎦<br />

N −1<br />

2π<br />

j k( n−m) N<br />

= ∑ ∑xmwn ( ) ( −m)<br />

∑ ∑e<br />

m k=<br />

0<br />

∞<br />

yn ( ) y( n) X( e ) e<br />

∑ ∑<br />

∑<br />

= xmwn ( ) ( −m) Nδ( n−m−rN) m r=−∞<br />

∞<br />

y( n) = N w( rN) x( n −rN)<br />

r =−∞<br />

•wn ( ) ≠ 0 for 0≤n≤L−1⇒ if N≥Lthen need<br />

only r = 0 term<br />

y(n) = Nw( 0)x(n)<br />

• if N < L then in order for y( n) = x( n) you need the condition w( rN) = 0, r = ± 1, ± 2,...<br />

• 'undersampled' representation can still work--at least in theory<br />

FBS Reconstruction in Non- Non<br />

Overlapping B<strong>and</strong>s<br />

• assume window length for all b<strong>and</strong>s is L samples<br />

• assume the same window is used for N equally spaced<br />

frequency b<strong>and</strong>s with analysis frequencies<br />

2π<br />

k<br />

ωk<br />

= , k = 01 , ,..., N −1<br />

N<br />

• where N can be less than L<br />

• assume w(n) is an ideal lowpass filter with cut<strong>of</strong>f frequency<br />

π<br />

ωp<br />

=<br />

N<br />

example with N=6<br />

equally spaced ideal<br />

filters<br />

49<br />

51<br />

53<br />

Filter Bank Summation<br />

Summary <strong>of</strong> FBS Method<br />

jω<br />

• perfect reconstruction <strong>of</strong> x(n) from Xn ( e ) is possible using FBS<br />

under the following conditions:<br />

1. w(n) is a finite duration filter/window<br />

jω<br />

2. Xn( e ) is sampled properly in both time<br />

<strong>and</strong> frequency<br />

j k<br />

perfect reconstruction <strong>of</strong> x(n) from Xn ( e ) is also possible using<br />

FBS under the following condition:<br />

ω<br />

•<br />

FBS under the following condition:<br />

jω<br />

We ( ) is perfectly b<strong>and</strong>limited<br />

jωk<br />

• To avoid time aliasing, Xn( e ) must be evaluated at at least L uniformly spaced<br />

frequencies, where L is the window duration<br />

-since window <strong>of</strong> length L samples has frequency b<strong>and</strong>width <strong>of</strong> from<br />

2π / L (for RW) to 4π<br />

/ L (for HW), the b<strong>and</strong>pass filters in FBS overlap<br />

in frequency since the analysis frequencies are 2 k/ L, k 01 , ,..., L 1<br />

j k<br />

there is a way (at least theoretically) for Xn( e ) to be evaluated in<br />

non-overlapping b<strong>and</strong>s<br />

52<br />

ω<br />

π = −<br />

•<br />

<strong>and</strong> for which x(n) can still be exactly recovered<br />

FBS Reconstruction in Non- Non<br />

Overlapping B<strong>and</strong>s<br />

• the composite impulse response for the FBS system is<br />

N−1 N−1<br />

jωkn jωkn hn ( ) = ∑wne ( ) = wn ( ) ∑ e<br />

k= 0 k=<br />

0<br />

• defining a composite <strong>of</strong> the terms being summed as<br />

N−1 N−1<br />

jωkn j2πkn/ N<br />

pn ( ) = ∑ e = ∑ e<br />

k= 0 k=<br />

0<br />

• we get for<br />

hn (<br />

)<br />

hn (<br />

) = wn ( ) pn ( )<br />

• it is easy to show (Prob. 6.7) that p(n) is a periodic train <strong>of</strong> impulses <strong>of</strong> the form<br />

∞<br />

pn ( ) = N∑ δ(<br />

n−rN) r =−∞<br />

• giving for hn (<br />

) the expression<br />

∞<br />

hn (<br />

) = N∑ wrN ( ) δ ( n−rN) r =−∞<br />

• thus the composite impulse response is the window sequence sampled<br />

at intervals <strong>of</strong> N<br />

samples<br />

50<br />

54<br />

9


FBS Reconstruction in Non- Non<br />

Overlapping B<strong>and</strong>s<br />

impulse response <strong>of</strong> ideal lowpass filter<br />

with cut<strong>of</strong>f frequency π/N<br />

• for ideal LPF we have<br />

sin( ππ n / N ) sin( ππ<br />

r ) 1<br />

w[n]= [ ] , wrN [ N]<br />

] = = δδ<br />

[ r ]<br />

πN πrN<br />

N<br />

giving hn [<br />

] = δ [ n]<br />

• other cases where perfect reconstruction is obtained<br />

1. wn [ ] is <strong>of</strong> finite length L< N<strong>and</strong><br />

causal (no images)<br />

2. wn [ ] has length > N <strong>and</strong> has the property<br />

wn [ ] = 1 / N, for n= rN 0<br />

= 0 for n = rN ( r ≠ r0, r = 0, ± 1, ± 2,...)<br />

giving hn [<br />

] = pnwn [ ] [ ] = δ [ n−rN 0 ]<br />

jω<br />

− jωr0N He ( ) = e ⇒ yn [ ] = xn ( −rN<br />

0 ]<br />

55<br />

Practical Implementation <strong>of</strong> FBS<br />

Filter Bank Reconstruction<br />

• frequency domain equations<br />

ˆ 1<br />

Ye ( ) PkFe ( ) ( )<br />

N−1<br />

2π<br />

j( ω−<br />

k)<br />

jω N<br />

= ∑<br />

⋅<br />

N k = 0<br />

R 1<br />

2π 2π 2π<br />

⎡ − 1<br />

j( ω− k− ) j(<br />

ω−<br />

)<br />

⎤<br />

N R R<br />

⎢ ∑ ∑We<br />

( ) Xe ( ) ⎥<br />

⎣ R =<br />

0<br />

⎦<br />

N−1<br />

2π 2π<br />

1<br />

j( ω− k) j( ω−<br />

k)<br />

jω N N<br />

= Xe ( ) ∑PkFe<br />

( ) ( ) We ( )<br />

RN k = 0<br />

R−1<br />

2π<br />

j( ω− )<br />

N−1<br />

2π 2π 2π<br />

1<br />

j(<br />

ω−<br />

k) j( ω−<br />

k−<br />

)<br />

R<br />

N N R<br />

+ ∑ Xe ( ) ⋅ ∑PkFe<br />

( ) ( We<br />

=<br />

1<br />

RN k = 0<br />

jω jω<br />

= He (<br />

) ⋅ Xe ( ) + aliasing terms<br />

57<br />

) ( )<br />

59<br />

x(n)<br />

Summary <strong>of</strong> FBS Reconstruction<br />

• for perfect reconstruction using FBS methods<br />

1. w(n) does not need to be either time-limited<br />

or frequency-limited to exactly reconstruct<br />

jωk<br />

x(n) from Xn ( e )<br />

2. w(n) just needs equally<br />

spaced zeros, spaced<br />

N samples apart for theoretically perfect reconstruction<br />

• exact reconstruction <strong>of</strong> the input is possible with a<br />

number <strong>of</strong> frequency channels less than that required<br />

by the sampling theorem<br />

• key issue is how to design digital filters that match these<br />

criteria<br />

Filter Bank Reconstruction<br />

e -j2π0n/N<br />

x<br />

e -j2π1n/N<br />

w(n)<br />

X n(e j2π0/N )<br />

R<br />

X n(e j2π1/N )<br />

<strong>Short</strong> Timme<br />

Modifications<br />

x w(n) R<br />

R f(n) x<br />

•<br />

•<br />

•<br />

e -j2π(N-1)n/N<br />

•<br />

•<br />

•<br />

•<br />

•<br />

•<br />

X n(e j2π(N-1)/N )<br />

R<br />

f(n)<br />

e j2π0n/N<br />

x w(n) R R f(n) x<br />

X rR(<br />

k)<br />

•<br />

•<br />

•<br />

Xˆ rR ( k)<br />

•<br />

•<br />

•<br />

x<br />

ej2π1n/N P(0)<br />

•<br />

•<br />

•<br />

e j2π(N-1)n/N<br />

P(1)<br />

P(N-1)<br />

yˆ k ( n)<br />

Filter Bank Reconstruction<br />

conditions for perfect reconstruction ( ˆ jω jω<br />

• Ye ( ) = Xe ( ))<br />

<strong>and</strong><br />

1<br />

He (<br />

) = PkFe ( ) ( ) We ( ) = 1<br />

1<br />

RN<br />

N−1<br />

2π 2π<br />

j( ω− k) j( ω−<br />

k)<br />

jω N N<br />

∑ RN k=<br />

0<br />

N −1<br />

∑<br />

k = 0<br />

Flat Gain<br />

2π 2π 2π<br />

j( ω− k) j( ω−<br />

k−<br />

)<br />

N N R<br />

PkFe ( ) ( ) We ( ) = 0 for = 12 , ,..., R<br />

Alias Cancellation<br />

+<br />

56<br />

yn ˆ( )<br />

58<br />

60<br />

10


Equivalent Filter Bank<br />

Filter Bank Reconstruction<br />

• consider a linear phase FIR system such that<br />

wn ( ) = wM ( −n), 0 ≤ n≤M = 0 otherwise<br />

• We = W e e<br />

jω jω − jωM / 2<br />

then ( ) real ( ) <strong>and</strong><br />

1<br />

2 2<br />

⎡ jω −jωM j( ω−π) −j( ω−π) M<br />

( W ( ) ) − ( ( ) ⎤ ) =<br />

4 ⎢ real e e Wreal e e<br />

⎣ ⎥⎦<br />

1<br />

2 2<br />

⎡ jω −jπM j(<br />

ω−π) ( Wreal ( e ) ) − e ( Wreal ( e ) ⎤ − jωM ) e<br />

4 ⎢⎣ ⎥⎦<br />

− jπM • if M is odd, then e = −1<strong>and</strong><br />

we get<br />

1<br />

2 2<br />

⎡ jω j(<br />

ω−π) ( W ( ) ) + ( ( ) ⎤ ) = 1<br />

4 ⎢ real e Wreal e<br />

⎣ ⎥⎦<br />

Quadrature Mirror Filters<br />

Lowpass filter<br />

designed by<br />

windowing<br />

Highpass filter is<br />

mirror image <strong>of</strong><br />

lowpass filter<br />

• practical realization <strong>of</strong> QMF filters<br />

• note that aliasing is cancelled, but the overall frequency<br />

response is not perfectly flat<br />

61<br />

63<br />

65<br />

Filter Bank Reconstruction<br />

• practical solution for the case wn ( ) = fn ( ), N= R=<br />

2 (2-b<strong>and</strong> solution)<br />

where the conditions for exact reconstruction become<br />

1<br />

jω jω j( ω−π) j(<br />

ω−π) ⎡ ( 0) ( ) ( ) + ( 1) ( ) ( ) ⎤ = 1<br />

4 ⎣<br />

P F e W e P F e W e<br />

⎦<br />

<strong>and</strong><br />

1<br />

jω j(<br />

ω−π) j( ω−π) j(<br />

ω−2π) ⎡P( 0)<br />

F( e ) W( e 1 0<br />

4 ⎣<br />

) + P( ) F( e ) W( e ) ⎤<br />

⎦<br />

=<br />

jω jω<br />

• aliasing cancels exactly if P( 0) = − P( 1) = 1(with<br />

F( e ) = W( e ))giving<br />

1<br />

2 2<br />

⎡ jω j( ω−π) − ω<br />

( ( ) ) − ( ( ) ⎤ j M<br />

We We ) = e<br />

4 ⎢⎣ ⎥⎦<br />

yn ˆ( ) = xn ( −M)<br />

Quadrature Mirror Filters<br />

jπn h0[ n] = w[ n] = g0[ n] 1 = = 1<br />

h[ n] w[ n] e g [ n]<br />

Tree Tree-Structured Structured QMF Filterbank<br />

62<br />

64<br />

66<br />

11


Discrete Wavelet Transforms<br />

FBS <strong>and</strong> OLA Comparisons<br />

duals<br />

• filter bank summation method ←⎯⎯⎯→overlap<br />

addition method<br />

-- one depends on sampling relation in frequency<br />

-- one depends on sampling relation in time<br />

• FBS requires sampling in frequency be such that the window<br />

jω<br />

transform We ( ) obeys the relation<br />

N−1<br />

1<br />

j(<br />

ω−ω k )<br />

∑ We ( ) = w ( 0 ) any ωω<br />

N<br />

k = 0<br />

• OLA requires that sampling in time be such that the window<br />

obeys the relation<br />

∞<br />

j 0<br />

∑ wrR ( − n) = We ( )/ R any n<br />

r =−∞<br />

• the key to <strong>Short</strong>-<strong>Time</strong> Fourier Analysis is the ability to modify<br />

the short-time spectrum (via quantization, noise enhancement,<br />

signal enhancement, speed-up/slow-down,etc) <strong>and</strong> recover<br />

an "unaliased" modified signal<br />

67<br />

69<br />

frequency<br />

<strong>STFT</strong> <strong>and</strong> CWT<br />

ω0 3ω 5ω 0 0<br />

ω0 2ω0<br />

4ω0<br />

8ω0<br />

time<br />

Constant b<strong>and</strong>width<br />

frequency<br />

Summary<br />

time<br />

Constant-Q b<strong>and</strong>width<br />

• <strong>Definition</strong> <strong>of</strong> short-term Fourier transform (<strong>STFT</strong>):<br />

∞<br />

jω−jωn •<br />

Xn( e ) = ∑ x[ m] w[ n−m] e<br />

m=−∞<br />

– keep ˆn fixed => <strong>STFT</strong> looks like a st<strong>and</strong>ard DFT =><br />

overlap-add (OLA) synthesis<br />

– keep ωˆω<br />

fixed => <strong>STFT</strong> looks like a filter bank => filter<br />

bank summation (FBS) synthesis<br />

– sampling rate in time based on window b<strong>and</strong>width;<br />

sampling rate in frequency based on window time width<br />

(length)<br />

– range <strong>of</strong> synthesis procedures for both FBS <strong>and</strong> OLA<br />

– <strong>STFT</strong> sequence can be displayed as an image =><br />

wideb<strong>and</strong>/narrowb<strong>and</strong> spectrograms<br />

70<br />

68<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!