14.07.2013 Views

Lecture 12_winter_2012_6tp.pdf

Lecture 12_winter_2012_6tp.pdf

Lecture 12_winter_2012_6tp.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Digital Speech Processing<br />

Processing—<br />

<strong>Lecture</strong> <strong>12</strong><br />

Homomorphic<br />

Speech Processing<br />

Basic Speech Model<br />

• short segment of speech can be modeled as<br />

having been generated by exciting an LTI system<br />

either by a quasi-periodic impulse train, or a<br />

random noise signal<br />

• speech analysis => estimate parameters of the<br />

speech h model, d l measure th their i variations i ti ( (and d<br />

perhaps even their statistical variabilites-for<br />

quantization) with time<br />

• speech = excitation * system response<br />

=> want to deconvolve speech into excitation and<br />

system response => do this using homomorphic<br />

filtering methods<br />

Generalized Superposition for Convolution<br />

x[ n]<br />

*<br />

H { }<br />

*<br />

y[ n]<br />

= H { x[<br />

n]<br />

}<br />

x1[ n]<br />

∗ x2[<br />

n]<br />

H { x1[ n]<br />

} ∗H<br />

{ x2[<br />

n]<br />

}<br />

• for LTI systems we have the result<br />

yn [ ] = xn [ ] ∗ hn [ ] = xkhn [ ] [ −k]<br />

∞<br />

∑<br />

k =−∞<br />

• "generalized" superposition => addition replaced by convolution<br />

xn [ ] = x1[ n] ∗x2[<br />

n]<br />

yn [ ] = H { x[ n]} = H { x1[ n]} ∗H<br />

{ x2[<br />

n]}<br />

• homomorphic system for<br />

convolution<br />

1<br />

3<br />

5<br />

General Discrete Discrete-Time Time Model of<br />

Speech Production<br />

pL[ n] = p[ n] ∗hV[ n]<br />

−− Voiced Speech<br />

hV[ n] = AV ⋅g[ n] ∗v[ n] ∗r[<br />

n]<br />

pL[ n] = u[ n] ∗hU[ n]<br />

−− Unvoiced Speech<br />

hU[ n] = AN ⋅v[ n] ∗r[<br />

n]<br />

2<br />

Superposition Principle<br />

+ +<br />

x[ n]<br />

x 1[<br />

n]<br />

+ x2[<br />

n]<br />

L{<br />

}<br />

y[ n]<br />

= L{<br />

x[<br />

n]}<br />

L { x1[ n]<br />

} + L{<br />

x2[<br />

n]<br />

}<br />

xn [ ] = ax1[ n] + bx2[ n]<br />

y[ n] = L{ x[ n]} = aL{ x [ n]} + bL{ x [ n]}<br />

1 2<br />

Homomorphic Filter<br />

• homomorphic filter => homomorphic system ⎡<br />

⎣H⎤ ⎦ that<br />

passes the desired signal unaltered, while removing the<br />

undesired signal<br />

x ( n ) = x 1 1[ n] ∗ x 2 2[ n] - with x 1 1[<br />

n]<br />

the "undesired" signal g<br />

H { xn [ ]} = H { x1[ n]} ∗H{<br />

x2[ n]}<br />

H { x1[ n]} → δ(<br />

n) - removal of x1[ n]<br />

H { x2[ n]} → x2[ n]<br />

H { xn [ ]} = δ[<br />

n] ∗ x2[ n] = x2[ n]<br />

• for linear systems this is analogous to additive noise removal<br />

4<br />

6<br />

1


x[ n]<br />

x n x n<br />

Canonic Form for Homomorphic<br />

Deconvolution<br />

* + + + + *<br />

D ∗{<br />

} L{<br />

}<br />

−1<br />

D { }<br />

xˆ[ n]<br />

yn ˆ[ ] ∗ yn [ ]<br />

ˆ[ ] ˆ [ ]<br />

ˆ [ ] ˆ [ ]<br />

1[ ] ∗ 2[<br />

] x1 n + x2 n y1 n + y2 n<br />

1 ∗ 2<br />

y [ n] y [ n]<br />

• any homomorphic system can be represented as a cascade<br />

of three systems, e.g., for convolution<br />

1. system takes inputs combined by convolution and transforms<br />

them into additive outputs<br />

2. system is a conventional linear system<br />

3. inverse of first system--takes additive inputs and transforms<br />

them into convolutional outputs<br />

Properties of Characteristic<br />

Systems<br />

xn ˆ[ ] = D∗{ xn [ ]} = D∗{<br />

x1[ n] ∗x2[<br />

n]}<br />

= D∗{ x1[ n]} + D∗{<br />

x2[ n]}<br />

= x xˆ [ n ] + x xˆ [ n ]<br />

1 2<br />

D ˆ D ˆ ˆ<br />

{ y [ n]} { y [ n]}<br />

−1 ∗ { yn [ ]} =<br />

−1<br />

∗ { y1[ n] + y2[ n]}<br />

−1 = D∗ ˆ1 −1<br />

∗D∗<br />

ˆ2<br />

= y1[ n] ∗ y2[ n] = y[ n]<br />

Canonic Form for Deconvolution Using DTFTs<br />

• need to find a system that converts convolution to addition<br />

xn [ ] = x1[ n] ∗x2[<br />

n]<br />

jω jω jω<br />

Xe ( ) = X1( e ) ⋅X2(<br />

e )<br />

• since<br />

D { [ ]} ˆ ˆ ˆ<br />

∗ xn = x1[ n] + x2[ n] = xn [ ]<br />

⎡ jω ( ) ⎤ ˆ jω ˆ jω ˆ jω<br />

D∗<br />

Xe = X1( e ) + X2( e ) = Xe ( )<br />

⎣ ⎦<br />

=> use log function which<br />

converts products to sums<br />

ˆ jω ( ) log ⎡ jω ( ) ⎤ log ⎡ jω jω<br />

Xe = Xe = X1( e ) ⋅ X2( e ) ⎤<br />

⎣ ⎦ ⎣ ⎦<br />

log ⎡ jω 1( ) ⎤ log ⎡ jω ˆ ˆ<br />

2( ) ⎤ jω jω<br />

= X e + X e = X1( e ) + X2( e )<br />

⎣ ⎦ ⎣ ⎦<br />

ˆ jω ( ) ⎡ ˆ jω ˆ jω ˆ ω ˆ ω<br />

1( ) 2( ) ⎤ j j<br />

Ye = L X e + X e = Y1( e ) + Y2( e )<br />

⎣ ⎦<br />

jω ( ) exp ˆ jω = ˆ j<br />

Ye ⎡ ω<br />

Y1( e ) + Y2( e ) ⎤ jω jω<br />

= Y1( e ) ⋅Y2(<br />

e )<br />

⎣ ⎦<br />

7<br />

9<br />

Canonic Form for Homomorphic Convolution<br />

x[ n]<br />

x n x n<br />

* + + + + *<br />

D ∗{<br />

} L{<br />

}<br />

−1<br />

D { }<br />

xˆ[ n]<br />

yn ˆ[ ] ∗ yn [ ]<br />

ˆ ˆ<br />

ˆ ˆ<br />

1[ ] ∗ 2[<br />

] x1[ n] + x2[ n]<br />

y1[ n] + y2[ n]<br />

1 ∗ 2<br />

y [ n] y [ n]<br />

xn [ ] = x 1 [ n ] ∗ x 2 [ n ]<br />

- convolutional relation<br />

xn ˆ[ ] = D ˆ ˆ<br />

∗{<br />

xn [ ]} = x1[ n] + x2[ n]<br />

- additive relation<br />

yn ˆ[ ] = L{<br />

xˆ ˆ ˆ ˆ<br />

1[ n] + x2[ n]} = y1[ n] + y2[ n]<br />

- conventional linear system<br />

−1<br />

y[ n] = D { ˆ ˆ<br />

∗ y1[ n] + y2[ n]} = y1[ n] ∗y2[<br />

n]<br />

- inverse of convolutional relation<br />

=> design converted back to linear system, L<br />

D∗<br />

⎡⎣ ⎤⎦<br />

- fixed (called the characteristic system for homomorphic deconvolution)<br />

−1<br />

D∗ ⎡⎣ ⎤⎦<br />

- fixed (characteristic system for inverse homomorphic deconvolution)<br />

Discrete Discrete-Time Time Fourier<br />

Transform Representations<br />

p<br />

Characteristic System for<br />

Deconvolution Using DTFTs<br />

jω ∞<br />

∑<br />

n=−∞<br />

− jωn Xe ( ) = xne [ ]<br />

ˆ jω jω jω jω<br />

Xe ( ) = log ⎡<br />

⎣<br />

Xe ( ) ⎤<br />

⎦<br />

= log Xe ( ) + jarg ⎡<br />

⎣<br />

Xe ( ) ⎤<br />

⎦<br />

π<br />

1 ˆ jω jωn xn ˆ[ ] = ( ) ω<br />

2π<br />

∫ Xe e d<br />

−π<br />

8<br />

10<br />

<strong>12</strong><br />

2


Inverse Characteristic System for Deconvolution Using<br />

DTFTs<br />

jω ∞<br />

∑<br />

n=−∞<br />

− jωn Ye ˆ( ) = yne ˆ[<br />

]<br />

jω ( ) exp ˆ jω<br />

Ye = ⎡ ( ) ⎤<br />

⎣<br />

Ye<br />

⎦<br />

π<br />

1<br />

jω jωn yn [ ] = ( ) ω<br />

2π<br />

∫ Ye e d<br />

−π<br />

Problems with arg Function<br />

13<br />

{ (<br />

jω ) } ≠ { 1(<br />

jω<br />

) }<br />

jω<br />

+ ARG { X 2 ( e ) }<br />

ARG X e ARG X e<br />

jω jω<br />

{ Xe } = { X1e }<br />

jω<br />

+ arg { X2( e ) }<br />

arg ( ) arg ( )<br />

Complex and Real Cepstrum<br />

define the inverse Fourier transform of ˆ jω<br />

•<br />

Xe ( ) as<br />

π<br />

1<br />

ˆ[ ] ˆ jω jωn xn = ( ) ω<br />

2π<br />

∫ Xe e d<br />

−π<br />

• where xn ˆ[<br />

] called the "complex cepstrum" since a complex<br />

logarithm g is involved in the computation<br />

• can also define a "real<br />

cepstrum" using just the real part of<br />

the logarithm, giving<br />

π<br />

1<br />

[ ] Re ⎡ ˆ jω ( ) ⎤ jωn cn =<br />

ω<br />

2π<br />

∫ Xe e d<br />

⎣ ⎦<br />

−π<br />

π<br />

1<br />

jω jωn = log | ( ) | ω<br />

2π<br />

∫ Xe e d<br />

−π<br />

• can show that cn [ ] is the even part of xn ˆ[<br />

]<br />

15<br />

17<br />

Issues with Logarithms<br />

• it is essential that the logarithm obey the equation<br />

log ⎡ jω jω 1( ) 2( ) ⎤ log ⎡ jω 1( ) ⎤ log ⎡ jω<br />

X e ⋅ X e = X e + X2( e ) ⎤<br />

⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />

jω jω<br />

• this is trivial if X1( e ) and X2( e ) are real -- however usually<br />

jω jω<br />

X1( e ) and X2( e ) are complex<br />

• on the unit circle the complex log can be written in the form:<br />

arg ⎡ jω<br />

( ) ⎤<br />

jω jω<br />

j X e<br />

Xe ( ) = | Xe ( )| e ⎣ ⎦<br />

log ⎡ jω ( ) ⎤ ˆ jω ω ω<br />

( ) log ⎡ j<br />

= = | ( )| ⎤+ arg ⎡ j<br />

Xe Xe Xe j Xe ( ) ⎤<br />

⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />

• no problems with log magnitude term; uniqueness<br />

problems<br />

arise in defining the imaginary part of the log; can show that<br />

the imaginary part (the phase angle of the z-transform) needs<br />

to be a continuous odd function of ω<br />

Complex Cepstrum Properties<br />

i Given a complex logarithm that satisfies the phase continuity<br />

condition, we have:<br />

π<br />

1<br />

jω jω jωn xn ˆ[ ] = (log | X( e ) | jarg{ X( e )}) e dω<br />

2π<br />

∫<br />

+<br />

−π<br />

jω jω<br />

i If xn [ ] real, then log|X ( e ) | is an even function of ω and arg{ X( e )}<br />

is an odd function of ω.<br />

This means that the real and imaginary parts of<br />

the complex log have the appropriate symmetry for xn ˆ[ ] to be a real<br />

sequence, and xn ˆ[ ] can be represented as:<br />

xn ˆ[ ] = cn [ ] + dn [ ]<br />

jω<br />

where cn [ ] is the inverse DTFT of log |X ( e )| and the even part of xn ˆ[<br />

], and<br />

jω<br />

dn [ ] is the inverse DTFT of arg{ Xe ( )} and the odd part of xn ˆ[<br />

]:<br />

xn ˆ[ ] + xˆ[ −n] xn ˆ[ ] −xˆ[ −n]<br />

cn [ ] = ; dn [ ] =<br />

2 2<br />

Terminology<br />

• Spectrum – Fourier transform of signal autocorrelation<br />

• Cepstrum – inverse Fourier transform of log spectrum<br />

• Analysis – determining the spectrum of a signal<br />

• Alanysis – determining the cepstrum of a signal<br />

• Filtering – linear operation on time signal<br />

• Liftering – linear operation on cepstrum<br />

• Frequency – independent variable of spectrum<br />

• Quefrency – independent variable of cepstrum<br />

• Harmonic – integer multiple of fundamental frequency<br />

• Rahmonic – integer multiple of fundamental frequency<br />

14<br />

16<br />

18<br />

3


z-Transform Transform Representation<br />

i The z − transform of the signal:<br />

xn [ ] = x1[ n]* x2[ n]<br />

is of the form:<br />

X( z) = X1( z) ⋅X2(<br />

z)<br />

i With an appropriate definition of the complex log, we get:<br />

X Xˆ ( z ) = log{ X ( z )} = log{ X X1 ( z ) ⋅ X X2 ( z )}<br />

= log{ X1(<br />

z)} + log{ X2( z)}<br />

= Xˆ ( z) + Xˆ ( z)<br />

1 2<br />

Inverse Characteristic System for<br />

Deconvolution<br />

∞<br />

∑<br />

Yz ˆ( ) = ynz ˆ[<br />

]<br />

n=−∞<br />

−n<br />

[ ]<br />

Yz ( ) = exp ⎡ ˆ(<br />

) ⎤<br />

⎣<br />

Yz<br />

⎦<br />

= log Yz ( ) + jarg Yz ( )<br />

1<br />

n<br />

yn [ ] = ( )<br />

2π<br />

∫ Yzzdz<br />

j<br />

z-Transform Transform Cepstrum Alanysis<br />

i express X( z)<br />

as product of minimum-phase and<br />

maximum-phase signals, i.e.,<br />

i where<br />

X( z) = X ( z) ⋅z<br />

X ( z)<br />

min<br />

max<br />

−M<br />

0<br />

min max<br />

M<br />

i<br />

− 1<br />

∏ ( 1−<br />

k )<br />

A az<br />

X ( z)<br />

=<br />

k = 1<br />

N<br />

i<br />

−1<br />

∏(<br />

1−<br />

cz k )<br />

k = 1<br />

i all poles and zeros inside unit circle<br />

Mi<br />

Mi<br />

−1<br />

∏(<br />

) k ∏<br />

X ( z) = −b<br />

1−bz<br />

k = 1<br />

k = 1<br />

i all zeros outside unit circle<br />

( )<br />

k<br />

19<br />

21<br />

23<br />

Characteristic System for Deconvolution<br />

∞<br />

−n<br />

Xz ( ) = ∑ xnz [ ] =<br />

n=−∞<br />

jarg { X( z)<br />

}<br />

Xz ( ) e<br />

Xˆ ( z) = log X( z) = log X( z) + jarg X( z)<br />

[ ] [ ]<br />

1 ˆ n<br />

xn ˆ[ ] = ( )<br />

2π<br />

∫ X z zdz<br />

j<br />

z-Transform Transform Cepstrum Alanysis<br />

• consider digital systems with rational z-transforms of the general type<br />

Xz ( ) =<br />

M<br />

M<br />

A a z b z<br />

i<br />

0<br />

−1 ∏( 1− k ) ∏(<br />

1−<br />

−1 −1)<br />

k<br />

k= 1 k=<br />

1<br />

Ni<br />

−1<br />

∏(<br />

1−<br />

cz k )<br />

k = 1<br />

i we can express the above equation as:<br />

M0MiM0 −M0 −1 −1<br />

z A∏−b 1− 1−<br />

k ∏ akz ∏ bkz k= 1 k = 1<br />

k=<br />

1<br />

Xz ( ) =<br />

Ni<br />

−1<br />

∏(<br />

1−<br />

cz k )<br />

k = 1<br />

• with all coefficients ak, bk, ck < 1 => all ck<br />

poles and<br />

ak zeros are inside the unit circle; all bk<br />

zeros<br />

are outside the unit circle;<br />

( ) ( ) ( )<br />

z-Transform Transform Cepstrum Alanysis<br />

i can express xn [ ] as the convolution:<br />

xn [ ] = xmin[ n] ∗xmax[ n−M0] i minimum-phase component is causal<br />

x [ n ] = 0, , n<<br />

0<br />

min<br />

i maximum-phase component is anti-causal<br />

x [ n] = 0, n><br />

0<br />

max<br />

M0<br />

factor z is the shift in time ori<br />

−<br />

i gin by M 0<br />

samples required so that the overall sequence,<br />

xn<br />

[ ] be causal<br />

20<br />

22<br />

24<br />

4


z-Transform Transform Cepstrum Alanysis<br />

• the complex logarithm of Xz ( ) is<br />

( ) log ⎡⎣ ( ) ⎤⎦<br />

log |<br />

M0<br />

| ∑log<br />

|<br />

k = 1<br />

−1<br />

k |<br />

−M0<br />

log[ ]<br />

M M<br />

N<br />

Xz ˆ = Xz = A + b + z +<br />

i 0<br />

i<br />

− 1 − 1<br />

∑ ∑log ( 1− az k ) + ∑ ∑log ( 1−bz k ) − ∑ ∑log<br />

( 1−cz<br />

k )<br />

k= 1 k= 1 k=<br />

1<br />

• evaluating Xz ˆ ( ) on the unit circle we can ignore the term<br />

0<br />

related to log ⎡ jωM e ⎤ (as this contributes only to the imaginary<br />

⎣ ⎦<br />

part and is a linear phase shift)<br />

Cepstrum Properties<br />

1. complex cepstrum is non-zero and of infinite extent for<br />

both positive and negative n, even though x[ n]<br />

may be<br />

causal, or even of finite duration ( X( z)<br />

has only zeros).<br />

2. complex cepstrum is a decaying sequence that is bounded by:<br />

| n|<br />

α<br />

| xn ˆ[<br />

]| < β , for | n|<br />

→∞<br />

| n |<br />

3. zero-quefrency value of complex cepstrum (and the cepstrum)<br />

depends on the gain constant and the zeros outside the unit circle.<br />

Setting x ˆ[0] = 0 (and therefore c[0]<br />

= 0)<br />

is equivalent to normalizing<br />

the log magnitude spectrum to a gain constant of:<br />

M0<br />

−1<br />

A∏( − bk)<br />

= 1<br />

k = 1<br />

4. If X( z) has no zeros outside the unit circle (all bk=<br />

0), then:<br />

xn ˆ[ ] = 0, n<<br />

0 (minimum-phase<br />

signals)<br />

5. If X( z) has no poles or zeros inside the unit circle (all ak, ck<br />

= 0), then:<br />

xn ˆ[ ] = 0, n><br />

0 (maximum-phase signals)<br />

z-Transform Transform Cepstrum Alanysis<br />

• Example 2--consider<br />

the case of a digital system with a<br />

single zero outside the unit circle ( b < 1)<br />

x2( n) = δ( n) + bδ( n+<br />

1)<br />

X2 ( z ) = 1 + bz (zero at z =− 1 / b )<br />

ˆ<br />

2( ) = log [ 2(<br />

) ] = log(<br />

1+<br />

)<br />

∞ + 1<br />

( −1)<br />

= ∑ ( )<br />

= 1<br />

ˆ2<br />

n<br />

X z bz z b<br />

X z X z bz<br />

n n<br />

b z<br />

n n<br />

n+ 1 n<br />

( −1)<br />

b<br />

x ( n) = u( −n−1) n<br />

25<br />

27<br />

29<br />

z-Transform Transform Cepstrum Alanysis<br />

• we can then evaluate the remaining terms, use power series<br />

expansion for logarithmic terms (and take the inverse<br />

transform to give the complex cepstrum) giving:<br />

π<br />

1<br />

ˆ(<br />

) ˆ jω jωn xn = ( ) ω<br />

2π<br />

∫ Xe e d<br />

2π<br />

∫<br />

−π<br />

∞ n<br />

Z<br />

log(1 − Z) =− ∑ , | Z | < 1<br />

n=<br />

1 n<br />

M0<br />

−1<br />

= log | A | + ∑log<br />

| bk| k = 1<br />

n = 0<br />

Ni n Mi<br />

n<br />

ck ak<br />

= ∑ −<br />

n ∑ n<br />

k= 1 k=<br />

1<br />

n > 0<br />

M0 −n<br />

bk<br />

= ∑ n<br />

k = 1<br />

n < 0<br />

26<br />

z-Transform Transform Cepstrum Alanysis<br />

i The main z-transform formula for cepstrum alanysis is based on<br />

the power series expansion:<br />

∞ ( −1)<br />

∑<br />

n+<br />

1<br />

n<br />

log( 1+ x) = x x < 1<br />

n=<br />

1 n<br />

i EExample l 1<br />

--Apply A l thi this fformula l tto th the exponential ti l sequence<br />

n<br />

1<br />

x1( n)<br />

= aun ( ) ⇔ X1( z)<br />

= −1<br />

1−<br />

az<br />

n+<br />

1<br />

∞<br />

ˆ −1( −1)<br />

n −n<br />

X1( z) = log[ X1( z)] = −log( 1−<br />

az ) = −∑( −a)<br />

z<br />

n=<br />

1 n<br />

n ∞ n<br />

a 1<br />

ˆ 1( ) ( 1) ˆ<br />

− ⎛a ⎞ −n<br />

x n = u n− ⇔ X1( z) = −log( 1−<br />

az ) = ∑⎜<br />

⎟z<br />

n n=<br />

1 ⎝ n ⎠<br />

z-Transform Transform Cepstrum Alanysis for 2 Pulses<br />

• Example 3--an<br />

input sequence of two pulses of the form<br />

x3( n) = δ( n) + αδ( n− Np)<br />

( 0 < α < 1)<br />

−Np<br />

X3( z) = 1+<br />

αz<br />

ˆ<br />

−Np<br />

X3( z) = log ⎡⎣X3( z) ⎤⎦<br />

= log(<br />

1+<br />

αz<br />

)<br />

∞ n + 1<br />

( −1)<br />

n −nNp<br />

= ∑ α z<br />

n<br />

n=<br />

1<br />

∞<br />

k<br />

ˆ k + 1 α<br />

x3( n) = ∑(<br />

−1) δ ( n−kNp) k<br />

k = 1<br />

• the cepstrum is an impulse train with impulses spaced at p samples<br />

N<br />

N p<br />

2N 2Np 3N 3Np 28<br />

30<br />

5


Cepstrum for Train of Impulses<br />

• an important special case is a train of impulses<br />

of the form:<br />

∑ M<br />

xn ( ) = αδ(<br />

n−rN) r p<br />

r = 0<br />

M<br />

N<br />

α<br />

0<br />

−rNp<br />

∑ r<br />

r =<br />

Xz ( ) = z<br />

−Np −1<br />

• clearly Xz ( ) is a polynomial in z rather than z ;<br />

thus Xz ( ) can be expressed as a product of factors<br />

−Np<br />

Np<br />

of the form ( 1−az ) and ( 1−bz<br />

), giving a complex<br />

cepstrum, xn ˆ(<br />

), that is non-zero only at integer multiples of N<br />

z-Transform Transform Cepstrum Alanysis for<br />

Convolution of 3 Sequences<br />

i Example 5--consider<br />

the convolution of sequences 1, 2 and 3, i.e.,<br />

x5( n) = x1( n) ∗x2( n) ∗x3(<br />

n)<br />

n<br />

= ⎡<br />

⎣<br />

aun ( ) ⎤<br />

⎦<br />

∗ [ δ( n) + bδ( n+ 1)<br />

] ∗ ⎡<br />

⎣δ( n) + αδ(<br />

n−N) ⎤ p ⎦<br />

n n−N p n<br />

n− N p + 1<br />

α p α<br />

p<br />

= a u ( n ) + a u ( n n− N ) + ba u ( n n+ 1 ) + ba u ( n n− N + 1 )<br />

i The complex cepstrum is therefore the sum of the complex cepstra<br />

of the three sequences<br />

xˆ5( n) = xˆ1( n) + xˆ ˆ<br />

2( n) + x3( n)<br />

n ∞ k+ 1 k n+ 1 n<br />

a ( −1) α ( −1)<br />

b<br />

= un ( − 1) + ∑ δ ( n− kNp) + u( −n−1) n k n<br />

k = 1<br />

Homomorphic Analysis of<br />

Speech Model<br />

p<br />

31<br />

33<br />

35<br />

z-Transform Transform Cepstrum Alanysis for<br />

Convolution of 2 Sequences<br />

--consider the convolution of sequences 1 and 3, i.e.,<br />

4( ) 1( ) 3(<br />

) ( ) δ( ) αδ(<br />

)<br />

( ) α ( )<br />

The comple cepstr m is therefore the s mofthecomple<br />

−<br />

i Example 4<br />

n<br />

x n = x n ∗ x n = ⎡<br />

⎣<br />

a u n ⎤<br />

⎦<br />

∗ ⎡<br />

⎣ n + n−N ⎤ p ⎦<br />

n<br />

n Np<br />

= aun + a un−Np i The complex cepstrum is therefore the sum of the complex<br />

cepstra<br />

of the two sequences (since convolution in the time domain is<br />

converted to addition in the cepstral domain)<br />

xˆ ( n) = xˆ ( n) + xˆ ( n)<br />

4 1 3<br />

n ∞ k+ 1 k<br />

a ( −1)<br />

α<br />

= un ( − 1)<br />

+ ∑ δ ( n−kNp) n k<br />

k = 1<br />

Example: a=.9, b=.8, a=.7, Np=15<br />

( 1)<br />

+<br />

− b<br />

n<br />

n 1 n<br />

a<br />

n<br />

n<br />

∞<br />

∑<br />

k = 1<br />

32<br />

( 1 α )<br />

−Np<br />

( 1+<br />

bz)<br />

Xz ( ) = + z<br />

−1<br />

( 1−<br />

az )<br />

k+ 1 k<br />

( −1)<br />

α<br />

δ [ n−kNp] k<br />

Homomorphic Analysis of Speech Model<br />

• the transfer function for voiced speech is of the form<br />

HV( z) = AV ⋅G(<br />

z) V( z) R( z)<br />

i with effective impulse response for voiced speech<br />

h hV [ n ] = A AV ⋅ g [ n ] ∗ v [ n ] ∗ r [ n ]<br />

• similarly for unvoiced speech we have<br />

H U ( z) = AU⋅V( z) R( z)<br />

i with effective impulse response for unvoiced speech<br />

h [ n] = A ⋅v[ n] ∗r[<br />

n]<br />

U U<br />

34<br />

36<br />

6


Complex Cepstrum for Speech<br />

• the models for the speech components are as follows:<br />

Mi<br />

M0<br />

−M−1 ∏ 1− k ∏ 1−<br />

k<br />

k= 1 k=<br />

1<br />

Ni<br />

−1<br />

∏(<br />

1−<br />

cz k )<br />

k = 1<br />

a k = b k = 0<br />

Az ( a z ) ( b z)<br />

1. vocal tract: Vz ( ) =<br />

--for o voiced o ced speec speech, , oonly y po poles es => a b , aall<br />

k<br />

--unvoiced speech and nasals,<br />

need pole-zero model but all poles are<br />

inside the unit circle => ck<br />

< 1<br />

--all speech has complex poles and zeros that occur in complex conjugate<br />

pairs<br />

−1<br />

2. radiation model: Rz ( ) ≈1−z(high frequency emphasis)<br />

3. glottal pulse model: finite duration pulse with transform<br />

Li<br />

L0<br />

−1<br />

Gz ( ) = B∏( 1−αkz) ∏(<br />

1−βkz)<br />

k= 1 k=<br />

1<br />

with zeros both inside and outside the unit circle<br />

Simplified Speech Model<br />

• short-time speech model<br />

x[ n] = w[ n] ⋅ p[ n] ∗g[ n] ∗v[ n] ∗r[<br />

n]<br />

≈ pw [ n ] ∗ h hv [ n ]<br />

• short-time complex cepstrum<br />

xn ˆ[ ] = pˆ [ n] + gn ˆ[ ] + vn ˆ[<br />

] + rn ˆ[<br />

]<br />

w<br />

[ ]<br />

Time Domain Analysis<br />

37<br />

39<br />

41<br />

Complex Cepstrum for Voiced<br />

Speech<br />

• combination of vocal tract, glottal pulse and<br />

radiation will be non-minimum phase =><br />

complex cepstrum exists for all values of n<br />

• the complex cepstrum will decay rapidly for<br />

large n (due to polynomial terms in expansion<br />

of complex cepstrum)<br />

• effect of the voiced source is a periodic pulse<br />

train for multiples of the pitch period<br />

Analysis of Model for Voiced Speech<br />

i Assume sustained /AE/ vowel with fundamental frequency of <strong>12</strong>5 Hz<br />

i Use glottal pulse model of the form:<br />

⎧0.5<br />

[1 − cos( π ( n+ 1) / N1)] 0 ≤n≤ N1−1<br />

⎪<br />

gn [ ] = ⎨cos(0.5<br />

π ( n+ 1 −N1)/ N2) N1 ≤n≤ N1+ N2<br />

−2<br />

⎪<br />

⎩ 0<br />

otherwise<br />

N1 = 25, N2<br />

= 10<br />

⇒ 34 sample impulse response, with transform<br />

33 33<br />

−33 −1<br />

Gz ( ) = z ∏( −bk ) ∏(1<br />

−bz k ) ⇒ all roots outside unit circle ⇒ maximum phase<br />

k= 1 k=<br />

1<br />

i Vocal tract system specified by 5 formants (frequencies and bandwidths)<br />

1<br />

V( z)<br />

= 5<br />

−2πσ kT −1 −4πσ<br />

kT<br />

−2<br />

∏(1−<br />

2e cos(2 π FT k ) z + e z )<br />

k = 1<br />

{ F , σ } = [(660,60),(1720,100),(2410,<strong>12</strong>0),(3500,175),(4500,250)]<br />

k k<br />

i Radiation load is simple first difference<br />

−1<br />

( ) 1 , 0.96<br />

Rz = − γzγ =<br />

Pole Pole-Zero Zero Analysis of Model<br />

Components<br />

38<br />

40<br />

42<br />

7


Spectral Analysis of Model<br />

Complex Cepstrum of Model<br />

i The voiced speech signal is modeled as:<br />

xn [ ] = AV ⋅gn [ ] ∗vn [ ] ∗rn [ ] ∗pn<br />

[ ]<br />

i with complex cepstrum:<br />

sn ˆ[ ] = log| A | [ ] ˆ[ ] ˆ[ ] ˆ[ ] ˆ<br />

V δ n + gn + vn + rn + pn [ ]<br />

i glottal pulse is maximum phase ⇒ gn ˆ[ [ ] = 00, n > 0<br />

i vocal tract<br />

and radiation systems are minimum phase<br />

⇒ vn ˆ[ ] = 0, n< 0, rn ˆ[<br />

] = 0, n<<br />

0<br />

∞ k<br />

ˆ −N β<br />

p −kNp<br />

Pz ( ) =−log(1 − β z ) = ∑ z<br />

k = 1 k<br />

∞ k<br />

β<br />

pn ˆ[ ] = ∑ δ[<br />

n−kNp] k<br />

k = 1<br />

Resulting Complex and Real<br />

Cepstra<br />

43<br />

45<br />

47<br />

Speech Model Output<br />

Cepstral Analysis of Model<br />

Frequency Domain Representations<br />

44<br />

46<br />

48<br />

8


Frequency<br />

Frequency-Domain Domain Representation of<br />

Complex Cepstrum<br />

Inverse System- System DFT Implementation<br />

Cepstral Computation Aliasing<br />

49<br />

51<br />

N=256, N Np=75, =75,<br />

α=0.8 =0.8<br />

Circle dots are<br />

are<br />

cepstrum<br />

values in<br />

correct<br />

locations; all<br />

other dots are<br />

results of<br />

aliasing due to<br />

finite range<br />

computations<br />

53<br />

The Complex Cepstrum-DFT Cepstrum DFT Implementation<br />

2π j k<br />

N<br />

∞<br />

∑<br />

n n=−∞∞<br />

2π<br />

− j kn<br />

N<br />

Xk [ ] = Xe ( ) = xne [ ] k= 01 , ,..., N−1,<br />

jω<br />

i Xp[ k] is the N point DFT corresponding to X( e )<br />

Xk ˆ[ ] = Xe ˆ(<br />

) = log Xk [ ] = log Xk [ ] + jarg Xk [ ]<br />

j2πk/ N { } { }<br />

N−1<br />

∑ ˆ<br />

2π<br />

j kn<br />

N<br />

∞<br />

∑<br />

01 1<br />

1<br />

xn ˆ[<br />

] = Xke [ ] = xn ˆ[<br />

+ rN] n= , ,..., N−<br />

N k= 0<br />

r=−∞<br />

• x ˆ[<br />

n] is an aliased version of xˆ[ n]<br />

⇒use<br />

as large a value of N as possible to minimize aliasing<br />

The Cepstrum-DFT Cepstrum DFT Implementation<br />

π<br />

1<br />

jω jωn cn [ ] = log| ( )| ω<br />

2π<br />

∫ Xe e d −∞ < n<<br />

∞<br />

−π<br />

i Approximation to cepstrum using DFT:<br />

2π 2π<br />

j k<br />

∞<br />

− j kn<br />

N N<br />

Xk [ ] = Xe ( ) = ∑ xne [ ] k= 01 , ,..., N−1,<br />

n=−∞<br />

N −1<br />

1<br />

j2πkn/ N<br />

cn [<br />

] = ∑log|<br />

Xk [ ]| e , 0≤ n≤N−1 N k = 0<br />

∞<br />

cn (<br />

) = ∑ cn [ + rN] n= 01 , ,..., N−1<br />

r =−∞<br />

• cn [<br />

] is an aliased version of cn [ ] ⇒use<br />

as large a value of N<br />

as possible to minimize aliasing<br />

ˆ[ ] + ˆ<br />

<br />

xn x[ −n]<br />

cn ( ) =<br />

2<br />

Summary<br />

1. Homomorphic System for Convolution:<br />

z( ) log( ) L( ) exp( ) z-1 ^ ^ x[n] X(z) X(z)<br />

Y(z)<br />

Y(z)<br />

y[n]<br />

( )<br />

X(z)<br />

^<br />

z-1 ( )<br />

x[n] ^<br />

Linear<br />

System<br />

y[n] ^<br />

z( )<br />

Y(z)<br />

^<br />

2. Practical Case:<br />

z( ) → DFT<br />

−1<br />

z () → IDFT<br />

jω<br />

jω jω<br />

jarg { X( e ) }<br />

Xe ( ) = Xe ( ) e<br />

jω jω jω<br />

log ⎡<br />

⎣<br />

Xe ( ) ⎤<br />

⎦<br />

= log Xe ( ) + jarg Xe ( )<br />

{ }<br />

50<br />

52<br />

54<br />

9


Summary<br />

3. Complex Cepstrum:<br />

1 ˆ jω jωn xn ˆ[<br />

] = ( ) ω<br />

2 ∫ Xe e d<br />

44. CCepstrum: t<br />

π<br />

π −π<br />

π<br />

1<br />

jω jωn cn [ ] = log ( ) ω<br />

2π<br />

∫ Xe e d<br />

−π<br />

xn ˆ[ ] + xˆ[ −n]<br />

cn [ ] = = even part of xn ˆ[<br />

]<br />

2<br />

Complex Cepstrum Without Phase<br />

Unwrapping<br />

i short-time analysis uses finite-length windowed segments, x[ n]<br />

M<br />

−n<br />

th<br />

X( z) = ∑ x[ n] z , M -order polynomial<br />

n=<br />

0<br />

i Find polynomial roots<br />

Mi ∏<br />

M0<br />

−1 m ∏<br />

−1 −1<br />

m<br />

m= 1 m=<br />

1<br />

X( z) = x[0] (1 −a z ) (1 −b<br />

z )<br />

i am<br />

roots are inside unit circle (minimum-phase<br />

part)<br />

i bm<br />

roots are outside unit circle (maximum-phase part)<br />

−1 −1<br />

i Factor out terms of form − bmz giving:<br />

MiM0 −M 0<br />

−1<br />

X( z) = Az (1 −a z ) (1 −b<br />

z)<br />

m m<br />

m= 1 m=<br />

1<br />

M0<br />

M0<br />

−1<br />

∏ m<br />

m=<br />

1<br />

A= x[0]( −1)<br />

b<br />

∏ ∏<br />

i Use polynomial root finder<br />

to find the zeros that lie inside<br />

and outside the unit circle and solve directly for xn ˆ[ ] .<br />

Recursive Relation for Complex<br />

Cepstrum for Minimum Phase Signals<br />

• the complex cepstrum for minimum phase signals<br />

can be computed recursively from the input signal,<br />

xn ( ) using the relation<br />

ˆ( ) = 0 < 0<br />

= log ⎡⎣ ( 0) ⎤⎦<br />

= 0<br />

−1<br />

( ) ⎛ ⎞ ( − )<br />

= − ˆ( )<br />

0<br />

( 0) ∑⎜ ⎟<br />

><br />

⎝ ⎠ ( 0)<br />

n<br />

xn n<br />

x n<br />

xn k xn k<br />

xk n<br />

x n x<br />

k = 0<br />

55<br />

57<br />

59<br />

x(n)<br />

Summary<br />

~<br />

X(k X(k) ^ ^<br />

X(k X(k)<br />

x(n x(n)<br />

DFT Complex Log IDFT<br />

5. Practical Implementation of Complex Cepstrum:<br />

∞<br />

j( 2π/ N) k − j( 2π/<br />

N) kn<br />

Xk [ ] = Xe ( ) = ∑ xne [ ] )<br />

n=−∞<br />

X ˆ [ k] = log { Xp( k) } = log Xp( k) + jarg { Xp( k)<br />

}<br />

N−1∞<br />

1 ˆ j( 2π<br />

/ N<br />

ˆ ) kn<br />

xn [ ] = ∑Xke [ ] = ∑ xn ˆ[<br />

+ rN]<br />

⇒aliasing<br />

N k= 0<br />

r=−∞<br />

xn ˆ[ ] = aliased version<br />

of xn ˆ[<br />

]<br />

6. Examples:<br />

∞ r<br />

1<br />

a<br />

Xz ( ) = ⇔ xn ˆ[<br />

] = δ [ ]<br />

−1<br />

∑ n−r 1−<br />

az r = 1 r<br />

∞ r<br />

b<br />

Xz ( ) = 1−bz⇔<br />

xn ˆ[<br />

] = − ∑ δ [ n+ r]<br />

r = 1 r<br />

Cepstrum for Minimum Phase Signals<br />

• for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum<br />

can be completely represented by the real part of the Fourier transforms<br />

• this means we can represent the complex<br />

cepstrum of minimum phase signals by the log<br />

of the magnitude of the FT alone<br />

• since the real part of the FT is the FT of the even part of the sequence<br />

ˆ<br />

ˆ( ) ˆ(<br />

)<br />

Re ⎡ jω ⎡ + − ⎤<br />

( ) ⎤<br />

xn x n<br />

Xe = FT<br />

⎣ ⎦ ⎢<br />

2<br />

⎥<br />

⎣ 2 ⎦<br />

jω<br />

FT ⎡⎣c( n)<br />

⎤⎦<br />

= log Xe ( )<br />

xn ˆ( ) + xˆ( −n)<br />

cn ( ) =<br />

2<br />

• giving<br />

xn ˆ(<br />

) = 0 n<<br />

0<br />

= cn ( ) n=<br />

0<br />

= 2cn ( ) n><br />

0<br />

• thus the complex cepstrum (for minimum phase signals) can be computed by computing<br />

the cepstrum and using the equation above<br />

58<br />

Recursive Relation for Complex<br />

Cepstrum for Minimum Phase Signals<br />

xn ( ) ←⎯→ Xz ( ) 1. basic z-transform<br />

dX( z)<br />

nx( n) ←⎯→− z =−zX′<br />

( z) dz<br />

2. scale by n rule<br />

xn ˆ( ) ←⎯→ Xz ˆ ( ) = log [ Xz ( ) ] 3. definition of complex cepstrum<br />

dXˆ ( z) d X′ ( z)<br />

= [ log[ Xz ( )] ] =<br />

dz dz X( z)<br />

4. differentiation of z-transform<br />

dXˆ ( z)<br />

−z<br />

Xz ( ) =−zX′<br />

( z)<br />

dz<br />

5. multiply both sides of equation<br />

56<br />

60<br />

10


Recursive Relation for Complex<br />

Cepstrum for Minimum Phase Signals<br />

dXˆ ( z)<br />

nxˆ( n) ∗x( n) ←⎯→− z X( z) =−zX′ ( z) ←⎯→ nx( n)<br />

dz<br />

∞<br />

nx( n) = xˆ ( k) x( n −k)<br />

k<br />

∑<br />

k =−∞<br />

( )<br />

i for minimum phase systems we have xn ˆ(<br />

) = 0for n<<br />

0,<br />

xn ( ) = 0 for n < 0 0,<br />

giving:<br />

n<br />

( ) ˆ<br />

⎛k⎞ xn = ∑ xkxn ( ) ( −k) ⎜ ⎟<br />

k = 0<br />

⎝n⎠ i separating out the term for k = n we get:<br />

n−1<br />

ˆ<br />

⎛k⎞ xn ( ) = ∑ xkxn ( ) ( − k) ⎜ ⎟+<br />

x( 0)<br />

xn ˆ(<br />

)<br />

k = 0<br />

⎝n⎠ n−1<br />

xn ( ) xn ( − k) ˆ( ) ˆ<br />

⎛k⎞ xn = − ∑ xk ( ) , > 0<br />

( 0) = 0 ( 0)<br />

⎜ ⎟ n<br />

x k x ⎝n⎠ xˆ( 0) = log x( 0) , xˆ( n) = 0, n < 0<br />

[ ]<br />

Cepstrum for Maximum Phase Signals<br />

• for maximum phase signals (no poles or<br />

zeros inside unit circle)<br />

xn ˆ( ) + xˆ( −n)<br />

cn ( ) =<br />

2<br />

• giving<br />

xn ˆ(<br />

) = 0 n><br />

0<br />

= cn ( ) n=<br />

0<br />

= 2cn ( ) n<<br />

0<br />

• thus the complex cepstrum (for maximum<br />

phase signals) can be computed<br />

by computing<br />

the cepstrum and using the equation above<br />

Computing Short-Time<br />

Short Time<br />

Cepstrums from Speech<br />

U Using i Polynomial P l i l Roots R t<br />

61<br />

63<br />

65<br />

Recursive Relation for Complex<br />

Cepstrum for Minimum Phase Signals<br />

i why is xˆ( 0) = log[ x(<br />

0)]?<br />

i assume we have a finite sequence xn ( ), n= 01 , ,..., N−1<br />

i we can write xn ( ) as:<br />

xn ( ) = x( 0) δ( n) + x( 1) δ( n− 1) + ... + xN ( −1) δ(<br />

N−1)<br />

⎡ x() 1 x( N −1)<br />

⎤<br />

= x( 0) ⎢δ( n) + δ( n− 1) + ... + δ(<br />

N −1)<br />

⎣ x( 0) x(<br />

0)<br />

⎥<br />

⎦<br />

i taking z − transforms, we get:<br />

N−1<br />

NZ1 NZ 2<br />

−n−1 Xz ( ) = ∑ xnz ( ) = G∏( 1−az k ) ∏(<br />

1−bz<br />

k )<br />

n= 0 k= 1 k=<br />

1<br />

i where the first term is the gain, G = x(<br />

0),<br />

and the two product terms are the<br />

zeros inside and outside the<br />

unit circle.<br />

i for minimum phase systems we have all zeros inside the unit circle so the<br />

second product term is gone, and we have the result that<br />

xˆ( 0) = log[ G] = log[ x( 0)]; xˆ( n) = 0, n < 0<br />

NZ1 a<br />

xn ˆ( ) =<br />

0<br />

k=1<br />

n ⎛ ⎞ k − ∑⎜<br />

⎟ n ><br />

⎝ n ⎠<br />

Recursive Relation for Complex<br />

Cepstrum for Maximum Phase Signals<br />

• the complex cepstrum for maximum phase signals<br />

can be computed recursively from the input signal,<br />

xn ( ) using the relation<br />

xn ˆ( ) = 0 n><br />

0<br />

= log ⎡⎣x( 0) ⎤⎦<br />

n = 0<br />

xn ( )<br />

= −<br />

x( 0) 0<br />

⎛k⎞ xn ( − k)<br />

∑ ⎜ ⎟xk<br />

ˆ( )<br />

⎝n⎠ x(<br />

0)<br />

n<<br />

0<br />

k= n+<br />

1<br />

Cepstrum From Polynomial Roots<br />

62<br />

64<br />

66<br />

11


Cepstrum From Polynomial Roots<br />

Practical Considerations<br />

• window to define short-time analysis<br />

• window duration (should be several pitch<br />

periods long)<br />

•size i of f FFT (to (t minimize i i i aliasing) li i )<br />

• elimination of linear phase components<br />

(positioning signals within frames)<br />

• cutoff quefrency of lifter<br />

• type of lifter (low/high quefrency)<br />

Voiced Speech Example<br />

67<br />

69<br />

Hamming window<br />

40 msec duration<br />

(section beginning<br />

at sample 13000<br />

in file<br />

test_16k.wav)<br />

71<br />

Computing Short-Time<br />

Short Time<br />

Cepstrums from Speech<br />

Using the DFT<br />

Computational Considerations<br />

Voiced Speech Example<br />

68<br />

70<br />

wrapped phase<br />

unwrapped<br />

phase<br />

72<br />

<strong>12</strong>


Voiced Speech Example<br />

Complex Cepstrum of Speech<br />

• model of speech:<br />

– voiced speech produced by a quasi-periodic<br />

pulse train exciting slowly time-varying linear<br />

system y => p[n] p[ ] convolved with hv[n] v[ ]<br />

– unvoiced speech produced by random noise<br />

exciting slowly time-varying linear system =><br />

u[n] convolved with hv[n] • time to examine full model and see what<br />

the complex cepstrum of speech looks like<br />

Voiced Speech Example<br />

73<br />

75<br />

Cepstrally<br />

smoothed log<br />

magnitude, 50<br />

quefrencies<br />

cutoff<br />

Cepstrally<br />

unwrapped<br />

phase, 50<br />

quefrencies<br />

cutoff<br />

77<br />

Characteristic System for<br />

Homomorphic Convolution<br />

• still need to define (and design) the L<br />

operator part (the linear system<br />

component) p ) of the system y to completely p y<br />

define the characteristic system for<br />

homomorphic convolution for speech<br />

– to do this properly and correctly, need to look<br />

at the properties of the complex cepstrum for<br />

speech signals<br />

Homomorphic Filtering of Voiced<br />

Speech<br />

• goal is to separate out the<br />

excitation impulses from the<br />

remaining components of the<br />

complex cepstrum<br />

• use cepstral window, l(n), to<br />

separate excitation pulses from<br />

combined vocal tract<br />

– l(n)=1 for |n|


Voiced Speech Example<br />

High quefrency<br />

liftering; cutoff<br />

quefrency=50;<br />

log magnitude<br />

and unwrapped<br />

phase<br />

Unvoiced Speech Example<br />

79<br />

Hamming window<br />

40 msec duration<br />

(section beginning<br />

at sample 3200 in<br />

file test_16k.wav)<br />

Unvoiced Speech Example<br />

81<br />

83<br />

Voiced Speech Example<br />

Estimated<br />

excitation<br />

function for<br />

voiced speech<br />

(Hamming<br />

window<br />

weighted)<br />

Unvoiced Speech Example<br />

80<br />

wrapped phase<br />

unwrapped<br />

phase<br />

Unvoiced Speech Example<br />

82<br />

Cepstrally<br />

smoothed log<br />

magnitude, 50<br />

quefrencies<br />

cutoff<br />

Cepstrally<br />

unwrapped<br />

phase, 50<br />

quefrencies<br />

cutoff<br />

84<br />

14


Unvoiced Speech Example<br />

Estimated<br />

excitation source<br />

for unvoiced<br />

speech section<br />

(Hamming<br />

window<br />

weighted)<br />

Review of Cepstral Calculation<br />

• 3 potential methods for computing cepstral<br />

coefficients, xn ˆ[ ] , of sequence x[n]<br />

– analytical method; assuming X(z) is a rational<br />

function; find poles and zeros and expand using log<br />

power series i<br />

– recursion method; assuming X(z) is either a minimum<br />

phase (all poles and zeros inside unit circle) or<br />

maximum phase (all poles and zeros outside unit<br />

circle) sequence<br />

– DFT implementation; using windows, with phase<br />

unwrapping (for complex cepstra)<br />

Cepstral Computation Aliasing<br />

i Effect of quefrency aliasing via a simple example<br />

xn [ ] = δ[ n] + αδ[<br />

n−Np] i with discrete-time Fourier transform<br />

j ( ) ω<br />

j N<br />

X e 1 e ω<br />

α −<br />

= +<br />

p<br />

i We can express the complex logarithm as<br />

ˆ j<br />

j Np<br />

X( e ) log{1 e }<br />

ω<br />

ω<br />

α −<br />

∞ m+ 1 m ⎛( −1)<br />

α ⎞<br />

= + = ∑⎜<br />

⎟e<br />

m=<br />

1⎝<br />

m ⎠<br />

i giving a complex cepstrum in the form<br />

∞ m+ 1 m ⎛ ⎞<br />

( −1)<br />

α<br />

xn ˆ[ ] = ∑⎜<br />

⎟δ[<br />

n−mNp] m=<br />

1⎝<br />

m ⎠<br />

− jωmNp 85<br />

87<br />

89<br />

Short Short-Time Time Homomorphic Analysis<br />

STFT<br />

Example 11—single<br />

single pole sequence<br />

(computed using all 3 methods)<br />

[ ] = [ ]<br />

n<br />

xn aun<br />

n<br />

a<br />

xn ˆ[ ] = un [ −1]<br />

n<br />

Example 22—voiced<br />

voiced speech frame<br />

86<br />

88<br />

90<br />

15


Example 33—low<br />

low quefrency liftering<br />

Example 4—effects 4 effects of low quefrency lifter<br />

Example 66—phase<br />

phase unwrapping<br />

91<br />

93<br />

95<br />

Example 33—high<br />

high quefrency liftering<br />

Example 55—phase<br />

phase unwrapping<br />

Homomorphic Spectrum Smoothing<br />

92<br />

94<br />

96<br />

16


Running Cepstrum<br />

Running Cepstrums<br />

Cepstrum Distance Measures<br />

i The cepstrum forms a natural basis for comparing<br />

patterns in speech recognition or vector quantization<br />

because of its stable mathematical characterization<br />

for speech signals<br />

i A typical "cepstral distance<br />

measure" is of the form:<br />

nco<br />

2<br />

D = ([] cn −cn<br />

[])<br />

∑<br />

n=<br />

1<br />

where cn [ ] and cn [ ] are cepstral sequences corresponding<br />

to frames of signal, and D is the cepstral distance between<br />

the pair of sequences.<br />

i Using Parseval's theorem, we can express the cepstral<br />

distance in the frequency domain as:<br />

1 π<br />

jω jω<br />

2<br />

D (log | H( e ) | log | H( e ) |) dω<br />

2π<br />

π<br />

Thus we see that the cepstral distance is actually a log<br />

magnitude spe<br />

−<br />

= ∫<br />

−<br />

i<br />

ctral distance<br />

97<br />

99<br />

101<br />

Running Cepstrum<br />

Cepstrum Applications<br />

Mel Frequency Cepstral Coefficients<br />

i Basic idea is to compute a frequency analysis based on a filter<br />

bank with approximately critical band spacing of the filters and<br />

bandwidths. For 4 kHz bandwidth, approximately 20 filters are used.<br />

i First perform a short-time Fourier analysis, giving Xm[ k], k = 0,1,..., NF / 2<br />

where m is the frame number and k is the frequency index (1 to half<br />

the size of the FFT)<br />

i Next the DFT values are grouped g p together g in critical bands and weighted g<br />

by triangular weighting functions.<br />

98<br />

100<br />

102<br />

17


Mel Frequency Cepstral Coefficients<br />

th th<br />

i The mel-spectrum of the m frame for the r filter ( r = 1, 2,..., R)<br />

is defined as:<br />

Ur<br />

1<br />

2<br />

MFm[]<br />

r = ∑ | Vr[] k Xm[]| k<br />

Ar<br />

k= Lr<br />

th<br />

where Vr[ k] is the weighting function for the r filter, ranging from<br />

DFT index Lr to Ur,<br />

and<br />

U Ur<br />

2<br />

Ar = ∑ | Vr[ k]|<br />

k= Lr<br />

th<br />

is the normalizing factor for the r mel-filter. (Normalization guarantees<br />

that if the input spectrum is flat, the mel-spectrum is flat).<br />

i A discrete cosine transform of the log magnitude of the filter outputs is<br />

computed to form the function mfcc [ n]<br />

as:<br />

R 1 ⎡2π⎛ 1⎞<br />

⎤<br />

mfccm[ n] = ∑log(<br />

MFm[<br />

r]) cos r n , n 1,2,..., Nmfcc<br />

R<br />

⎢ ⎜ + ⎟<br />

r=<br />

1<br />

R 2<br />

⎥ =<br />

⎣ ⎝ ⎠ ⎦<br />

i Typically N = 13 and R=<br />

24 for 4 kHz bandwidth speech signals. 103<br />

mfcc<br />

Homomorphic Vocoder<br />

• time-dependent complex cepstrum retains all the<br />

information of the time-dependent Fourier<br />

transform => exact representation of speech<br />

• time dependent real cepstrum loses phase<br />

information -> > not an exact representation of<br />

speech<br />

• quantization of cepstral parameters also loses<br />

information<br />

• cepstrum gives good estimates of pitch, voicing,<br />

formants => can build homomorphic vocoder<br />

Homomorphic Vocoder<br />

• l(n) is cepstrum window that selects low-time<br />

values and is of length 26 samples homomorphic<br />

vocoder<br />

105<br />

107<br />

Delta Cepstrum<br />

i The set of mel frequency cepstral coefficients provide perceptually<br />

meaningful and smooth estimates of speech spectra, over time<br />

i Since speech is inherently a dynamic signal, it is reasonable to seek<br />

a representation that includes some aspect of the dynamic nature of<br />

the time derivatives (both first and second order derivatives) of the shortterm<br />

cepstrum<br />

i The resulting parameter sets are called the<br />

delta cepstrum (first derivative)<br />

and the delta-delta cepstrum (second derivative).<br />

i Th The simplest i l t method th d of f computing ti ddelta lt cepstrum t parameters t is i a first fi t<br />

difference of cepstral vectors, of the form:<br />

Δ mfccm[ n] = mfccm[ n] −mfccm−1[<br />

n]<br />

i The simple difference is a poor approximation to the first derivative and is<br />

not generally used. Instead a least-squares approximation to the local slope<br />

(over a region<br />

around the current sample) is used, and is of the form:<br />

M<br />

∑ k( mfccm+<br />

k[<br />

n])<br />

k=−M Δ mfccm[<br />

n]<br />

=<br />

M<br />

2<br />

∑ k<br />

k=−M where the region is M frames before and after the current frame<br />

Homomorphic Vocoder<br />

1. compute cepstrum every 10-20 msec<br />

2. estimate pitch period and<br />

voiced/unvoiced decision<br />

3. quantize and encode low-time cepstral<br />

values l<br />

4. at synthesizer-get approximation to hv(n) or hu(n) from low time quantized cepstral<br />

values<br />

5. convolve hv(n) or hu(n) with excitation<br />

created from pitch, voiced/unvoiced, and<br />

amplitude information<br />

Homomorphic Vocoder Impulse Responses<br />

104<br />

106<br />

108<br />

18


Summary<br />

• Introduced the concept of the cepstrum of a signal,<br />

defined as the inverse Fourier transform of the log of the<br />

signal spectrum<br />

1<br />

j<br />

ˆ[ ] log ( )<br />

xn F X e ω<br />

− = ⎡<br />

⎣<br />

⎤<br />

⎦<br />

• Showed cepstrum p reflected pproperties p of both the<br />

excitation (high quefrency) and the vocal tract (low<br />

quefrency)<br />

– short quefrency window filters out excitation; long quefrency<br />

window filters out vocal tract<br />

• Mel-scale cepstral coefficients used as feature set for<br />

speech recognition<br />

• Delta and delta-delta cepstral coefficients used as<br />

indicators of spectral change over time 109<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!