Lecture 12_winter_2012_6tp.pdf
Lecture 12_winter_2012_6tp.pdf
Lecture 12_winter_2012_6tp.pdf
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Digital Speech Processing<br />
Processing—<br />
<strong>Lecture</strong> <strong>12</strong><br />
Homomorphic<br />
Speech Processing<br />
Basic Speech Model<br />
• short segment of speech can be modeled as<br />
having been generated by exciting an LTI system<br />
either by a quasi-periodic impulse train, or a<br />
random noise signal<br />
• speech analysis => estimate parameters of the<br />
speech h model, d l measure th their i variations i ti ( (and d<br />
perhaps even their statistical variabilites-for<br />
quantization) with time<br />
• speech = excitation * system response<br />
=> want to deconvolve speech into excitation and<br />
system response => do this using homomorphic<br />
filtering methods<br />
Generalized Superposition for Convolution<br />
x[ n]<br />
*<br />
H { }<br />
*<br />
y[ n]<br />
= H { x[<br />
n]<br />
}<br />
x1[ n]<br />
∗ x2[<br />
n]<br />
H { x1[ n]<br />
} ∗H<br />
{ x2[<br />
n]<br />
}<br />
• for LTI systems we have the result<br />
yn [ ] = xn [ ] ∗ hn [ ] = xkhn [ ] [ −k]<br />
∞<br />
∑<br />
k =−∞<br />
• "generalized" superposition => addition replaced by convolution<br />
xn [ ] = x1[ n] ∗x2[<br />
n]<br />
yn [ ] = H { x[ n]} = H { x1[ n]} ∗H<br />
{ x2[<br />
n]}<br />
• homomorphic system for<br />
convolution<br />
1<br />
3<br />
5<br />
General Discrete Discrete-Time Time Model of<br />
Speech Production<br />
pL[ n] = p[ n] ∗hV[ n]<br />
−− Voiced Speech<br />
hV[ n] = AV ⋅g[ n] ∗v[ n] ∗r[<br />
n]<br />
pL[ n] = u[ n] ∗hU[ n]<br />
−− Unvoiced Speech<br />
hU[ n] = AN ⋅v[ n] ∗r[<br />
n]<br />
2<br />
Superposition Principle<br />
+ +<br />
x[ n]<br />
x 1[<br />
n]<br />
+ x2[<br />
n]<br />
L{<br />
}<br />
y[ n]<br />
= L{<br />
x[<br />
n]}<br />
L { x1[ n]<br />
} + L{<br />
x2[<br />
n]<br />
}<br />
xn [ ] = ax1[ n] + bx2[ n]<br />
y[ n] = L{ x[ n]} = aL{ x [ n]} + bL{ x [ n]}<br />
1 2<br />
Homomorphic Filter<br />
• homomorphic filter => homomorphic system ⎡<br />
⎣H⎤ ⎦ that<br />
passes the desired signal unaltered, while removing the<br />
undesired signal<br />
x ( n ) = x 1 1[ n] ∗ x 2 2[ n] - with x 1 1[<br />
n]<br />
the "undesired" signal g<br />
H { xn [ ]} = H { x1[ n]} ∗H{<br />
x2[ n]}<br />
H { x1[ n]} → δ(<br />
n) - removal of x1[ n]<br />
H { x2[ n]} → x2[ n]<br />
H { xn [ ]} = δ[<br />
n] ∗ x2[ n] = x2[ n]<br />
• for linear systems this is analogous to additive noise removal<br />
4<br />
6<br />
1
x[ n]<br />
x n x n<br />
Canonic Form for Homomorphic<br />
Deconvolution<br />
* + + + + *<br />
D ∗{<br />
} L{<br />
}<br />
−1<br />
D { }<br />
xˆ[ n]<br />
yn ˆ[ ] ∗ yn [ ]<br />
ˆ[ ] ˆ [ ]<br />
ˆ [ ] ˆ [ ]<br />
1[ ] ∗ 2[<br />
] x1 n + x2 n y1 n + y2 n<br />
1 ∗ 2<br />
y [ n] y [ n]<br />
• any homomorphic system can be represented as a cascade<br />
of three systems, e.g., for convolution<br />
1. system takes inputs combined by convolution and transforms<br />
them into additive outputs<br />
2. system is a conventional linear system<br />
3. inverse of first system--takes additive inputs and transforms<br />
them into convolutional outputs<br />
Properties of Characteristic<br />
Systems<br />
xn ˆ[ ] = D∗{ xn [ ]} = D∗{<br />
x1[ n] ∗x2[<br />
n]}<br />
= D∗{ x1[ n]} + D∗{<br />
x2[ n]}<br />
= x xˆ [ n ] + x xˆ [ n ]<br />
1 2<br />
D ˆ D ˆ ˆ<br />
{ y [ n]} { y [ n]}<br />
−1 ∗ { yn [ ]} =<br />
−1<br />
∗ { y1[ n] + y2[ n]}<br />
−1 = D∗ ˆ1 −1<br />
∗D∗<br />
ˆ2<br />
= y1[ n] ∗ y2[ n] = y[ n]<br />
Canonic Form for Deconvolution Using DTFTs<br />
• need to find a system that converts convolution to addition<br />
xn [ ] = x1[ n] ∗x2[<br />
n]<br />
jω jω jω<br />
Xe ( ) = X1( e ) ⋅X2(<br />
e )<br />
• since<br />
D { [ ]} ˆ ˆ ˆ<br />
∗ xn = x1[ n] + x2[ n] = xn [ ]<br />
⎡ jω ( ) ⎤ ˆ jω ˆ jω ˆ jω<br />
D∗<br />
Xe = X1( e ) + X2( e ) = Xe ( )<br />
⎣ ⎦<br />
=> use log function which<br />
converts products to sums<br />
ˆ jω ( ) log ⎡ jω ( ) ⎤ log ⎡ jω jω<br />
Xe = Xe = X1( e ) ⋅ X2( e ) ⎤<br />
⎣ ⎦ ⎣ ⎦<br />
log ⎡ jω 1( ) ⎤ log ⎡ jω ˆ ˆ<br />
2( ) ⎤ jω jω<br />
= X e + X e = X1( e ) + X2( e )<br />
⎣ ⎦ ⎣ ⎦<br />
ˆ jω ( ) ⎡ ˆ jω ˆ jω ˆ ω ˆ ω<br />
1( ) 2( ) ⎤ j j<br />
Ye = L X e + X e = Y1( e ) + Y2( e )<br />
⎣ ⎦<br />
jω ( ) exp ˆ jω = ˆ j<br />
Ye ⎡ ω<br />
Y1( e ) + Y2( e ) ⎤ jω jω<br />
= Y1( e ) ⋅Y2(<br />
e )<br />
⎣ ⎦<br />
7<br />
9<br />
Canonic Form for Homomorphic Convolution<br />
x[ n]<br />
x n x n<br />
* + + + + *<br />
D ∗{<br />
} L{<br />
}<br />
−1<br />
D { }<br />
xˆ[ n]<br />
yn ˆ[ ] ∗ yn [ ]<br />
ˆ ˆ<br />
ˆ ˆ<br />
1[ ] ∗ 2[<br />
] x1[ n] + x2[ n]<br />
y1[ n] + y2[ n]<br />
1 ∗ 2<br />
y [ n] y [ n]<br />
xn [ ] = x 1 [ n ] ∗ x 2 [ n ]<br />
- convolutional relation<br />
xn ˆ[ ] = D ˆ ˆ<br />
∗{<br />
xn [ ]} = x1[ n] + x2[ n]<br />
- additive relation<br />
yn ˆ[ ] = L{<br />
xˆ ˆ ˆ ˆ<br />
1[ n] + x2[ n]} = y1[ n] + y2[ n]<br />
- conventional linear system<br />
−1<br />
y[ n] = D { ˆ ˆ<br />
∗ y1[ n] + y2[ n]} = y1[ n] ∗y2[<br />
n]<br />
- inverse of convolutional relation<br />
=> design converted back to linear system, L<br />
D∗<br />
⎡⎣ ⎤⎦<br />
- fixed (called the characteristic system for homomorphic deconvolution)<br />
−1<br />
D∗ ⎡⎣ ⎤⎦<br />
- fixed (characteristic system for inverse homomorphic deconvolution)<br />
Discrete Discrete-Time Time Fourier<br />
Transform Representations<br />
p<br />
Characteristic System for<br />
Deconvolution Using DTFTs<br />
jω ∞<br />
∑<br />
n=−∞<br />
− jωn Xe ( ) = xne [ ]<br />
ˆ jω jω jω jω<br />
Xe ( ) = log ⎡<br />
⎣<br />
Xe ( ) ⎤<br />
⎦<br />
= log Xe ( ) + jarg ⎡<br />
⎣<br />
Xe ( ) ⎤<br />
⎦<br />
π<br />
1 ˆ jω jωn xn ˆ[ ] = ( ) ω<br />
2π<br />
∫ Xe e d<br />
−π<br />
8<br />
10<br />
<strong>12</strong><br />
2
Inverse Characteristic System for Deconvolution Using<br />
DTFTs<br />
jω ∞<br />
∑<br />
n=−∞<br />
− jωn Ye ˆ( ) = yne ˆ[<br />
]<br />
jω ( ) exp ˆ jω<br />
Ye = ⎡ ( ) ⎤<br />
⎣<br />
Ye<br />
⎦<br />
π<br />
1<br />
jω jωn yn [ ] = ( ) ω<br />
2π<br />
∫ Ye e d<br />
−π<br />
Problems with arg Function<br />
13<br />
{ (<br />
jω ) } ≠ { 1(<br />
jω<br />
) }<br />
jω<br />
+ ARG { X 2 ( e ) }<br />
ARG X e ARG X e<br />
jω jω<br />
{ Xe } = { X1e }<br />
jω<br />
+ arg { X2( e ) }<br />
arg ( ) arg ( )<br />
Complex and Real Cepstrum<br />
define the inverse Fourier transform of ˆ jω<br />
•<br />
Xe ( ) as<br />
π<br />
1<br />
ˆ[ ] ˆ jω jωn xn = ( ) ω<br />
2π<br />
∫ Xe e d<br />
−π<br />
• where xn ˆ[<br />
] called the "complex cepstrum" since a complex<br />
logarithm g is involved in the computation<br />
• can also define a "real<br />
cepstrum" using just the real part of<br />
the logarithm, giving<br />
π<br />
1<br />
[ ] Re ⎡ ˆ jω ( ) ⎤ jωn cn =<br />
ω<br />
2π<br />
∫ Xe e d<br />
⎣ ⎦<br />
−π<br />
π<br />
1<br />
jω jωn = log | ( ) | ω<br />
2π<br />
∫ Xe e d<br />
−π<br />
• can show that cn [ ] is the even part of xn ˆ[<br />
]<br />
15<br />
17<br />
Issues with Logarithms<br />
• it is essential that the logarithm obey the equation<br />
log ⎡ jω jω 1( ) 2( ) ⎤ log ⎡ jω 1( ) ⎤ log ⎡ jω<br />
X e ⋅ X e = X e + X2( e ) ⎤<br />
⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />
jω jω<br />
• this is trivial if X1( e ) and X2( e ) are real -- however usually<br />
jω jω<br />
X1( e ) and X2( e ) are complex<br />
• on the unit circle the complex log can be written in the form:<br />
arg ⎡ jω<br />
( ) ⎤<br />
jω jω<br />
j X e<br />
Xe ( ) = | Xe ( )| e ⎣ ⎦<br />
log ⎡ jω ( ) ⎤ ˆ jω ω ω<br />
( ) log ⎡ j<br />
= = | ( )| ⎤+ arg ⎡ j<br />
Xe Xe Xe j Xe ( ) ⎤<br />
⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />
• no problems with log magnitude term; uniqueness<br />
problems<br />
arise in defining the imaginary part of the log; can show that<br />
the imaginary part (the phase angle of the z-transform) needs<br />
to be a continuous odd function of ω<br />
Complex Cepstrum Properties<br />
i Given a complex logarithm that satisfies the phase continuity<br />
condition, we have:<br />
π<br />
1<br />
jω jω jωn xn ˆ[ ] = (log | X( e ) | jarg{ X( e )}) e dω<br />
2π<br />
∫<br />
+<br />
−π<br />
jω jω<br />
i If xn [ ] real, then log|X ( e ) | is an even function of ω and arg{ X( e )}<br />
is an odd function of ω.<br />
This means that the real and imaginary parts of<br />
the complex log have the appropriate symmetry for xn ˆ[ ] to be a real<br />
sequence, and xn ˆ[ ] can be represented as:<br />
xn ˆ[ ] = cn [ ] + dn [ ]<br />
jω<br />
where cn [ ] is the inverse DTFT of log |X ( e )| and the even part of xn ˆ[<br />
], and<br />
jω<br />
dn [ ] is the inverse DTFT of arg{ Xe ( )} and the odd part of xn ˆ[<br />
]:<br />
xn ˆ[ ] + xˆ[ −n] xn ˆ[ ] −xˆ[ −n]<br />
cn [ ] = ; dn [ ] =<br />
2 2<br />
Terminology<br />
• Spectrum – Fourier transform of signal autocorrelation<br />
• Cepstrum – inverse Fourier transform of log spectrum<br />
• Analysis – determining the spectrum of a signal<br />
• Alanysis – determining the cepstrum of a signal<br />
• Filtering – linear operation on time signal<br />
• Liftering – linear operation on cepstrum<br />
• Frequency – independent variable of spectrum<br />
• Quefrency – independent variable of cepstrum<br />
• Harmonic – integer multiple of fundamental frequency<br />
• Rahmonic – integer multiple of fundamental frequency<br />
14<br />
16<br />
18<br />
3
z-Transform Transform Representation<br />
i The z − transform of the signal:<br />
xn [ ] = x1[ n]* x2[ n]<br />
is of the form:<br />
X( z) = X1( z) ⋅X2(<br />
z)<br />
i With an appropriate definition of the complex log, we get:<br />
X Xˆ ( z ) = log{ X ( z )} = log{ X X1 ( z ) ⋅ X X2 ( z )}<br />
= log{ X1(<br />
z)} + log{ X2( z)}<br />
= Xˆ ( z) + Xˆ ( z)<br />
1 2<br />
Inverse Characteristic System for<br />
Deconvolution<br />
∞<br />
∑<br />
Yz ˆ( ) = ynz ˆ[<br />
]<br />
n=−∞<br />
−n<br />
[ ]<br />
Yz ( ) = exp ⎡ ˆ(<br />
) ⎤<br />
⎣<br />
Yz<br />
⎦<br />
= log Yz ( ) + jarg Yz ( )<br />
1<br />
n<br />
yn [ ] = ( )<br />
2π<br />
∫ Yzzdz<br />
j<br />
z-Transform Transform Cepstrum Alanysis<br />
i express X( z)<br />
as product of minimum-phase and<br />
maximum-phase signals, i.e.,<br />
i where<br />
X( z) = X ( z) ⋅z<br />
X ( z)<br />
min<br />
max<br />
−M<br />
0<br />
min max<br />
M<br />
i<br />
− 1<br />
∏ ( 1−<br />
k )<br />
A az<br />
X ( z)<br />
=<br />
k = 1<br />
N<br />
i<br />
−1<br />
∏(<br />
1−<br />
cz k )<br />
k = 1<br />
i all poles and zeros inside unit circle<br />
Mi<br />
Mi<br />
−1<br />
∏(<br />
) k ∏<br />
X ( z) = −b<br />
1−bz<br />
k = 1<br />
k = 1<br />
i all zeros outside unit circle<br />
( )<br />
k<br />
19<br />
21<br />
23<br />
Characteristic System for Deconvolution<br />
∞<br />
−n<br />
Xz ( ) = ∑ xnz [ ] =<br />
n=−∞<br />
jarg { X( z)<br />
}<br />
Xz ( ) e<br />
Xˆ ( z) = log X( z) = log X( z) + jarg X( z)<br />
[ ] [ ]<br />
1 ˆ n<br />
xn ˆ[ ] = ( )<br />
2π<br />
∫ X z zdz<br />
j<br />
z-Transform Transform Cepstrum Alanysis<br />
• consider digital systems with rational z-transforms of the general type<br />
Xz ( ) =<br />
M<br />
M<br />
A a z b z<br />
i<br />
0<br />
−1 ∏( 1− k ) ∏(<br />
1−<br />
−1 −1)<br />
k<br />
k= 1 k=<br />
1<br />
Ni<br />
−1<br />
∏(<br />
1−<br />
cz k )<br />
k = 1<br />
i we can express the above equation as:<br />
M0MiM0 −M0 −1 −1<br />
z A∏−b 1− 1−<br />
k ∏ akz ∏ bkz k= 1 k = 1<br />
k=<br />
1<br />
Xz ( ) =<br />
Ni<br />
−1<br />
∏(<br />
1−<br />
cz k )<br />
k = 1<br />
• with all coefficients ak, bk, ck < 1 => all ck<br />
poles and<br />
ak zeros are inside the unit circle; all bk<br />
zeros<br />
are outside the unit circle;<br />
( ) ( ) ( )<br />
z-Transform Transform Cepstrum Alanysis<br />
i can express xn [ ] as the convolution:<br />
xn [ ] = xmin[ n] ∗xmax[ n−M0] i minimum-phase component is causal<br />
x [ n ] = 0, , n<<br />
0<br />
min<br />
i maximum-phase component is anti-causal<br />
x [ n] = 0, n><br />
0<br />
max<br />
M0<br />
factor z is the shift in time ori<br />
−<br />
i gin by M 0<br />
samples required so that the overall sequence,<br />
xn<br />
[ ] be causal<br />
20<br />
22<br />
24<br />
4
z-Transform Transform Cepstrum Alanysis<br />
• the complex logarithm of Xz ( ) is<br />
( ) log ⎡⎣ ( ) ⎤⎦<br />
log |<br />
M0<br />
| ∑log<br />
|<br />
k = 1<br />
−1<br />
k |<br />
−M0<br />
log[ ]<br />
M M<br />
N<br />
Xz ˆ = Xz = A + b + z +<br />
i 0<br />
i<br />
− 1 − 1<br />
∑ ∑log ( 1− az k ) + ∑ ∑log ( 1−bz k ) − ∑ ∑log<br />
( 1−cz<br />
k )<br />
k= 1 k= 1 k=<br />
1<br />
• evaluating Xz ˆ ( ) on the unit circle we can ignore the term<br />
0<br />
related to log ⎡ jωM e ⎤ (as this contributes only to the imaginary<br />
⎣ ⎦<br />
part and is a linear phase shift)<br />
Cepstrum Properties<br />
1. complex cepstrum is non-zero and of infinite extent for<br />
both positive and negative n, even though x[ n]<br />
may be<br />
causal, or even of finite duration ( X( z)<br />
has only zeros).<br />
2. complex cepstrum is a decaying sequence that is bounded by:<br />
| n|<br />
α<br />
| xn ˆ[<br />
]| < β , for | n|<br />
→∞<br />
| n |<br />
3. zero-quefrency value of complex cepstrum (and the cepstrum)<br />
depends on the gain constant and the zeros outside the unit circle.<br />
Setting x ˆ[0] = 0 (and therefore c[0]<br />
= 0)<br />
is equivalent to normalizing<br />
the log magnitude spectrum to a gain constant of:<br />
M0<br />
−1<br />
A∏( − bk)<br />
= 1<br />
k = 1<br />
4. If X( z) has no zeros outside the unit circle (all bk=<br />
0), then:<br />
xn ˆ[ ] = 0, n<<br />
0 (minimum-phase<br />
signals)<br />
5. If X( z) has no poles or zeros inside the unit circle (all ak, ck<br />
= 0), then:<br />
xn ˆ[ ] = 0, n><br />
0 (maximum-phase signals)<br />
z-Transform Transform Cepstrum Alanysis<br />
• Example 2--consider<br />
the case of a digital system with a<br />
single zero outside the unit circle ( b < 1)<br />
x2( n) = δ( n) + bδ( n+<br />
1)<br />
X2 ( z ) = 1 + bz (zero at z =− 1 / b )<br />
ˆ<br />
2( ) = log [ 2(<br />
) ] = log(<br />
1+<br />
)<br />
∞ + 1<br />
( −1)<br />
= ∑ ( )<br />
= 1<br />
ˆ2<br />
n<br />
X z bz z b<br />
X z X z bz<br />
n n<br />
b z<br />
n n<br />
n+ 1 n<br />
( −1)<br />
b<br />
x ( n) = u( −n−1) n<br />
25<br />
27<br />
29<br />
z-Transform Transform Cepstrum Alanysis<br />
• we can then evaluate the remaining terms, use power series<br />
expansion for logarithmic terms (and take the inverse<br />
transform to give the complex cepstrum) giving:<br />
π<br />
1<br />
ˆ(<br />
) ˆ jω jωn xn = ( ) ω<br />
2π<br />
∫ Xe e d<br />
2π<br />
∫<br />
−π<br />
∞ n<br />
Z<br />
log(1 − Z) =− ∑ , | Z | < 1<br />
n=<br />
1 n<br />
M0<br />
−1<br />
= log | A | + ∑log<br />
| bk| k = 1<br />
n = 0<br />
Ni n Mi<br />
n<br />
ck ak<br />
= ∑ −<br />
n ∑ n<br />
k= 1 k=<br />
1<br />
n > 0<br />
M0 −n<br />
bk<br />
= ∑ n<br />
k = 1<br />
n < 0<br />
26<br />
z-Transform Transform Cepstrum Alanysis<br />
i The main z-transform formula for cepstrum alanysis is based on<br />
the power series expansion:<br />
∞ ( −1)<br />
∑<br />
n+<br />
1<br />
n<br />
log( 1+ x) = x x < 1<br />
n=<br />
1 n<br />
i EExample l 1<br />
--Apply A l thi this fformula l tto th the exponential ti l sequence<br />
n<br />
1<br />
x1( n)<br />
= aun ( ) ⇔ X1( z)<br />
= −1<br />
1−<br />
az<br />
n+<br />
1<br />
∞<br />
ˆ −1( −1)<br />
n −n<br />
X1( z) = log[ X1( z)] = −log( 1−<br />
az ) = −∑( −a)<br />
z<br />
n=<br />
1 n<br />
n ∞ n<br />
a 1<br />
ˆ 1( ) ( 1) ˆ<br />
− ⎛a ⎞ −n<br />
x n = u n− ⇔ X1( z) = −log( 1−<br />
az ) = ∑⎜<br />
⎟z<br />
n n=<br />
1 ⎝ n ⎠<br />
z-Transform Transform Cepstrum Alanysis for 2 Pulses<br />
• Example 3--an<br />
input sequence of two pulses of the form<br />
x3( n) = δ( n) + αδ( n− Np)<br />
( 0 < α < 1)<br />
−Np<br />
X3( z) = 1+<br />
αz<br />
ˆ<br />
−Np<br />
X3( z) = log ⎡⎣X3( z) ⎤⎦<br />
= log(<br />
1+<br />
αz<br />
)<br />
∞ n + 1<br />
( −1)<br />
n −nNp<br />
= ∑ α z<br />
n<br />
n=<br />
1<br />
∞<br />
k<br />
ˆ k + 1 α<br />
x3( n) = ∑(<br />
−1) δ ( n−kNp) k<br />
k = 1<br />
• the cepstrum is an impulse train with impulses spaced at p samples<br />
N<br />
N p<br />
2N 2Np 3N 3Np 28<br />
30<br />
5
Cepstrum for Train of Impulses<br />
• an important special case is a train of impulses<br />
of the form:<br />
∑ M<br />
xn ( ) = αδ(<br />
n−rN) r p<br />
r = 0<br />
M<br />
N<br />
α<br />
0<br />
−rNp<br />
∑ r<br />
r =<br />
Xz ( ) = z<br />
−Np −1<br />
• clearly Xz ( ) is a polynomial in z rather than z ;<br />
thus Xz ( ) can be expressed as a product of factors<br />
−Np<br />
Np<br />
of the form ( 1−az ) and ( 1−bz<br />
), giving a complex<br />
cepstrum, xn ˆ(<br />
), that is non-zero only at integer multiples of N<br />
z-Transform Transform Cepstrum Alanysis for<br />
Convolution of 3 Sequences<br />
i Example 5--consider<br />
the convolution of sequences 1, 2 and 3, i.e.,<br />
x5( n) = x1( n) ∗x2( n) ∗x3(<br />
n)<br />
n<br />
= ⎡<br />
⎣<br />
aun ( ) ⎤<br />
⎦<br />
∗ [ δ( n) + bδ( n+ 1)<br />
] ∗ ⎡<br />
⎣δ( n) + αδ(<br />
n−N) ⎤ p ⎦<br />
n n−N p n<br />
n− N p + 1<br />
α p α<br />
p<br />
= a u ( n ) + a u ( n n− N ) + ba u ( n n+ 1 ) + ba u ( n n− N + 1 )<br />
i The complex cepstrum is therefore the sum of the complex cepstra<br />
of the three sequences<br />
xˆ5( n) = xˆ1( n) + xˆ ˆ<br />
2( n) + x3( n)<br />
n ∞ k+ 1 k n+ 1 n<br />
a ( −1) α ( −1)<br />
b<br />
= un ( − 1) + ∑ δ ( n− kNp) + u( −n−1) n k n<br />
k = 1<br />
Homomorphic Analysis of<br />
Speech Model<br />
p<br />
31<br />
33<br />
35<br />
z-Transform Transform Cepstrum Alanysis for<br />
Convolution of 2 Sequences<br />
--consider the convolution of sequences 1 and 3, i.e.,<br />
4( ) 1( ) 3(<br />
) ( ) δ( ) αδ(<br />
)<br />
( ) α ( )<br />
The comple cepstr m is therefore the s mofthecomple<br />
−<br />
i Example 4<br />
n<br />
x n = x n ∗ x n = ⎡<br />
⎣<br />
a u n ⎤<br />
⎦<br />
∗ ⎡<br />
⎣ n + n−N ⎤ p ⎦<br />
n<br />
n Np<br />
= aun + a un−Np i The complex cepstrum is therefore the sum of the complex<br />
cepstra<br />
of the two sequences (since convolution in the time domain is<br />
converted to addition in the cepstral domain)<br />
xˆ ( n) = xˆ ( n) + xˆ ( n)<br />
4 1 3<br />
n ∞ k+ 1 k<br />
a ( −1)<br />
α<br />
= un ( − 1)<br />
+ ∑ δ ( n−kNp) n k<br />
k = 1<br />
Example: a=.9, b=.8, a=.7, Np=15<br />
( 1)<br />
+<br />
− b<br />
n<br />
n 1 n<br />
a<br />
n<br />
n<br />
∞<br />
∑<br />
k = 1<br />
32<br />
( 1 α )<br />
−Np<br />
( 1+<br />
bz)<br />
Xz ( ) = + z<br />
−1<br />
( 1−<br />
az )<br />
k+ 1 k<br />
( −1)<br />
α<br />
δ [ n−kNp] k<br />
Homomorphic Analysis of Speech Model<br />
• the transfer function for voiced speech is of the form<br />
HV( z) = AV ⋅G(<br />
z) V( z) R( z)<br />
i with effective impulse response for voiced speech<br />
h hV [ n ] = A AV ⋅ g [ n ] ∗ v [ n ] ∗ r [ n ]<br />
• similarly for unvoiced speech we have<br />
H U ( z) = AU⋅V( z) R( z)<br />
i with effective impulse response for unvoiced speech<br />
h [ n] = A ⋅v[ n] ∗r[<br />
n]<br />
U U<br />
34<br />
36<br />
6
Complex Cepstrum for Speech<br />
• the models for the speech components are as follows:<br />
Mi<br />
M0<br />
−M−1 ∏ 1− k ∏ 1−<br />
k<br />
k= 1 k=<br />
1<br />
Ni<br />
−1<br />
∏(<br />
1−<br />
cz k )<br />
k = 1<br />
a k = b k = 0<br />
Az ( a z ) ( b z)<br />
1. vocal tract: Vz ( ) =<br />
--for o voiced o ced speec speech, , oonly y po poles es => a b , aall<br />
k<br />
--unvoiced speech and nasals,<br />
need pole-zero model but all poles are<br />
inside the unit circle => ck<br />
< 1<br />
--all speech has complex poles and zeros that occur in complex conjugate<br />
pairs<br />
−1<br />
2. radiation model: Rz ( ) ≈1−z(high frequency emphasis)<br />
3. glottal pulse model: finite duration pulse with transform<br />
Li<br />
L0<br />
−1<br />
Gz ( ) = B∏( 1−αkz) ∏(<br />
1−βkz)<br />
k= 1 k=<br />
1<br />
with zeros both inside and outside the unit circle<br />
Simplified Speech Model<br />
• short-time speech model<br />
x[ n] = w[ n] ⋅ p[ n] ∗g[ n] ∗v[ n] ∗r[<br />
n]<br />
≈ pw [ n ] ∗ h hv [ n ]<br />
• short-time complex cepstrum<br />
xn ˆ[ ] = pˆ [ n] + gn ˆ[ ] + vn ˆ[<br />
] + rn ˆ[<br />
]<br />
w<br />
[ ]<br />
Time Domain Analysis<br />
37<br />
39<br />
41<br />
Complex Cepstrum for Voiced<br />
Speech<br />
• combination of vocal tract, glottal pulse and<br />
radiation will be non-minimum phase =><br />
complex cepstrum exists for all values of n<br />
• the complex cepstrum will decay rapidly for<br />
large n (due to polynomial terms in expansion<br />
of complex cepstrum)<br />
• effect of the voiced source is a periodic pulse<br />
train for multiples of the pitch period<br />
Analysis of Model for Voiced Speech<br />
i Assume sustained /AE/ vowel with fundamental frequency of <strong>12</strong>5 Hz<br />
i Use glottal pulse model of the form:<br />
⎧0.5<br />
[1 − cos( π ( n+ 1) / N1)] 0 ≤n≤ N1−1<br />
⎪<br />
gn [ ] = ⎨cos(0.5<br />
π ( n+ 1 −N1)/ N2) N1 ≤n≤ N1+ N2<br />
−2<br />
⎪<br />
⎩ 0<br />
otherwise<br />
N1 = 25, N2<br />
= 10<br />
⇒ 34 sample impulse response, with transform<br />
33 33<br />
−33 −1<br />
Gz ( ) = z ∏( −bk ) ∏(1<br />
−bz k ) ⇒ all roots outside unit circle ⇒ maximum phase<br />
k= 1 k=<br />
1<br />
i Vocal tract system specified by 5 formants (frequencies and bandwidths)<br />
1<br />
V( z)<br />
= 5<br />
−2πσ kT −1 −4πσ<br />
kT<br />
−2<br />
∏(1−<br />
2e cos(2 π FT k ) z + e z )<br />
k = 1<br />
{ F , σ } = [(660,60),(1720,100),(2410,<strong>12</strong>0),(3500,175),(4500,250)]<br />
k k<br />
i Radiation load is simple first difference<br />
−1<br />
( ) 1 , 0.96<br />
Rz = − γzγ =<br />
Pole Pole-Zero Zero Analysis of Model<br />
Components<br />
38<br />
40<br />
42<br />
7
Spectral Analysis of Model<br />
Complex Cepstrum of Model<br />
i The voiced speech signal is modeled as:<br />
xn [ ] = AV ⋅gn [ ] ∗vn [ ] ∗rn [ ] ∗pn<br />
[ ]<br />
i with complex cepstrum:<br />
sn ˆ[ ] = log| A | [ ] ˆ[ ] ˆ[ ] ˆ[ ] ˆ<br />
V δ n + gn + vn + rn + pn [ ]<br />
i glottal pulse is maximum phase ⇒ gn ˆ[ [ ] = 00, n > 0<br />
i vocal tract<br />
and radiation systems are minimum phase<br />
⇒ vn ˆ[ ] = 0, n< 0, rn ˆ[<br />
] = 0, n<<br />
0<br />
∞ k<br />
ˆ −N β<br />
p −kNp<br />
Pz ( ) =−log(1 − β z ) = ∑ z<br />
k = 1 k<br />
∞ k<br />
β<br />
pn ˆ[ ] = ∑ δ[<br />
n−kNp] k<br />
k = 1<br />
Resulting Complex and Real<br />
Cepstra<br />
43<br />
45<br />
47<br />
Speech Model Output<br />
Cepstral Analysis of Model<br />
Frequency Domain Representations<br />
44<br />
46<br />
48<br />
8
Frequency<br />
Frequency-Domain Domain Representation of<br />
Complex Cepstrum<br />
Inverse System- System DFT Implementation<br />
Cepstral Computation Aliasing<br />
49<br />
51<br />
N=256, N Np=75, =75,<br />
α=0.8 =0.8<br />
Circle dots are<br />
are<br />
cepstrum<br />
values in<br />
correct<br />
locations; all<br />
other dots are<br />
results of<br />
aliasing due to<br />
finite range<br />
computations<br />
53<br />
The Complex Cepstrum-DFT Cepstrum DFT Implementation<br />
2π j k<br />
N<br />
∞<br />
∑<br />
n n=−∞∞<br />
2π<br />
− j kn<br />
N<br />
Xk [ ] = Xe ( ) = xne [ ] k= 01 , ,..., N−1,<br />
jω<br />
i Xp[ k] is the N point DFT corresponding to X( e )<br />
Xk ˆ[ ] = Xe ˆ(<br />
) = log Xk [ ] = log Xk [ ] + jarg Xk [ ]<br />
j2πk/ N { } { }<br />
N−1<br />
∑ ˆ<br />
2π<br />
j kn<br />
N<br />
∞<br />
∑<br />
01 1<br />
1<br />
xn ˆ[<br />
] = Xke [ ] = xn ˆ[<br />
+ rN] n= , ,..., N−<br />
N k= 0<br />
r=−∞<br />
• x ˆ[<br />
n] is an aliased version of xˆ[ n]<br />
⇒use<br />
as large a value of N as possible to minimize aliasing<br />
The Cepstrum-DFT Cepstrum DFT Implementation<br />
π<br />
1<br />
jω jωn cn [ ] = log| ( )| ω<br />
2π<br />
∫ Xe e d −∞ < n<<br />
∞<br />
−π<br />
i Approximation to cepstrum using DFT:<br />
2π 2π<br />
j k<br />
∞<br />
− j kn<br />
N N<br />
Xk [ ] = Xe ( ) = ∑ xne [ ] k= 01 , ,..., N−1,<br />
n=−∞<br />
N −1<br />
1<br />
j2πkn/ N<br />
cn [<br />
] = ∑log|<br />
Xk [ ]| e , 0≤ n≤N−1 N k = 0<br />
∞<br />
cn (<br />
) = ∑ cn [ + rN] n= 01 , ,..., N−1<br />
r =−∞<br />
• cn [<br />
] is an aliased version of cn [ ] ⇒use<br />
as large a value of N<br />
as possible to minimize aliasing<br />
ˆ[ ] + ˆ<br />
<br />
xn x[ −n]<br />
cn ( ) =<br />
2<br />
Summary<br />
1. Homomorphic System for Convolution:<br />
z( ) log( ) L( ) exp( ) z-1 ^ ^ x[n] X(z) X(z)<br />
Y(z)<br />
Y(z)<br />
y[n]<br />
( )<br />
X(z)<br />
^<br />
z-1 ( )<br />
x[n] ^<br />
Linear<br />
System<br />
y[n] ^<br />
z( )<br />
Y(z)<br />
^<br />
2. Practical Case:<br />
z( ) → DFT<br />
−1<br />
z () → IDFT<br />
jω<br />
jω jω<br />
jarg { X( e ) }<br />
Xe ( ) = Xe ( ) e<br />
jω jω jω<br />
log ⎡<br />
⎣<br />
Xe ( ) ⎤<br />
⎦<br />
= log Xe ( ) + jarg Xe ( )<br />
{ }<br />
50<br />
52<br />
54<br />
9
Summary<br />
3. Complex Cepstrum:<br />
1 ˆ jω jωn xn ˆ[<br />
] = ( ) ω<br />
2 ∫ Xe e d<br />
44. CCepstrum: t<br />
π<br />
π −π<br />
π<br />
1<br />
jω jωn cn [ ] = log ( ) ω<br />
2π<br />
∫ Xe e d<br />
−π<br />
xn ˆ[ ] + xˆ[ −n]<br />
cn [ ] = = even part of xn ˆ[<br />
]<br />
2<br />
Complex Cepstrum Without Phase<br />
Unwrapping<br />
i short-time analysis uses finite-length windowed segments, x[ n]<br />
M<br />
−n<br />
th<br />
X( z) = ∑ x[ n] z , M -order polynomial<br />
n=<br />
0<br />
i Find polynomial roots<br />
Mi ∏<br />
M0<br />
−1 m ∏<br />
−1 −1<br />
m<br />
m= 1 m=<br />
1<br />
X( z) = x[0] (1 −a z ) (1 −b<br />
z )<br />
i am<br />
roots are inside unit circle (minimum-phase<br />
part)<br />
i bm<br />
roots are outside unit circle (maximum-phase part)<br />
−1 −1<br />
i Factor out terms of form − bmz giving:<br />
MiM0 −M 0<br />
−1<br />
X( z) = Az (1 −a z ) (1 −b<br />
z)<br />
m m<br />
m= 1 m=<br />
1<br />
M0<br />
M0<br />
−1<br />
∏ m<br />
m=<br />
1<br />
A= x[0]( −1)<br />
b<br />
∏ ∏<br />
i Use polynomial root finder<br />
to find the zeros that lie inside<br />
and outside the unit circle and solve directly for xn ˆ[ ] .<br />
Recursive Relation for Complex<br />
Cepstrum for Minimum Phase Signals<br />
• the complex cepstrum for minimum phase signals<br />
can be computed recursively from the input signal,<br />
xn ( ) using the relation<br />
ˆ( ) = 0 < 0<br />
= log ⎡⎣ ( 0) ⎤⎦<br />
= 0<br />
−1<br />
( ) ⎛ ⎞ ( − )<br />
= − ˆ( )<br />
0<br />
( 0) ∑⎜ ⎟<br />
><br />
⎝ ⎠ ( 0)<br />
n<br />
xn n<br />
x n<br />
xn k xn k<br />
xk n<br />
x n x<br />
k = 0<br />
55<br />
57<br />
59<br />
x(n)<br />
Summary<br />
~<br />
X(k X(k) ^ ^<br />
X(k X(k)<br />
x(n x(n)<br />
DFT Complex Log IDFT<br />
5. Practical Implementation of Complex Cepstrum:<br />
∞<br />
j( 2π/ N) k − j( 2π/<br />
N) kn<br />
Xk [ ] = Xe ( ) = ∑ xne [ ] )<br />
n=−∞<br />
X ˆ [ k] = log { Xp( k) } = log Xp( k) + jarg { Xp( k)<br />
}<br />
N−1∞<br />
1 ˆ j( 2π<br />
/ N<br />
ˆ ) kn<br />
xn [ ] = ∑Xke [ ] = ∑ xn ˆ[<br />
+ rN]<br />
⇒aliasing<br />
N k= 0<br />
r=−∞<br />
xn ˆ[ ] = aliased version<br />
of xn ˆ[<br />
]<br />
6. Examples:<br />
∞ r<br />
1<br />
a<br />
Xz ( ) = ⇔ xn ˆ[<br />
] = δ [ ]<br />
−1<br />
∑ n−r 1−<br />
az r = 1 r<br />
∞ r<br />
b<br />
Xz ( ) = 1−bz⇔<br />
xn ˆ[<br />
] = − ∑ δ [ n+ r]<br />
r = 1 r<br />
Cepstrum for Minimum Phase Signals<br />
• for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum<br />
can be completely represented by the real part of the Fourier transforms<br />
• this means we can represent the complex<br />
cepstrum of minimum phase signals by the log<br />
of the magnitude of the FT alone<br />
• since the real part of the FT is the FT of the even part of the sequence<br />
ˆ<br />
ˆ( ) ˆ(<br />
)<br />
Re ⎡ jω ⎡ + − ⎤<br />
( ) ⎤<br />
xn x n<br />
Xe = FT<br />
⎣ ⎦ ⎢<br />
2<br />
⎥<br />
⎣ 2 ⎦<br />
jω<br />
FT ⎡⎣c( n)<br />
⎤⎦<br />
= log Xe ( )<br />
xn ˆ( ) + xˆ( −n)<br />
cn ( ) =<br />
2<br />
• giving<br />
xn ˆ(<br />
) = 0 n<<br />
0<br />
= cn ( ) n=<br />
0<br />
= 2cn ( ) n><br />
0<br />
• thus the complex cepstrum (for minimum phase signals) can be computed by computing<br />
the cepstrum and using the equation above<br />
58<br />
Recursive Relation for Complex<br />
Cepstrum for Minimum Phase Signals<br />
xn ( ) ←⎯→ Xz ( ) 1. basic z-transform<br />
dX( z)<br />
nx( n) ←⎯→− z =−zX′<br />
( z) dz<br />
2. scale by n rule<br />
xn ˆ( ) ←⎯→ Xz ˆ ( ) = log [ Xz ( ) ] 3. definition of complex cepstrum<br />
dXˆ ( z) d X′ ( z)<br />
= [ log[ Xz ( )] ] =<br />
dz dz X( z)<br />
4. differentiation of z-transform<br />
dXˆ ( z)<br />
−z<br />
Xz ( ) =−zX′<br />
( z)<br />
dz<br />
5. multiply both sides of equation<br />
56<br />
60<br />
10
Recursive Relation for Complex<br />
Cepstrum for Minimum Phase Signals<br />
dXˆ ( z)<br />
nxˆ( n) ∗x( n) ←⎯→− z X( z) =−zX′ ( z) ←⎯→ nx( n)<br />
dz<br />
∞<br />
nx( n) = xˆ ( k) x( n −k)<br />
k<br />
∑<br />
k =−∞<br />
( )<br />
i for minimum phase systems we have xn ˆ(<br />
) = 0for n<<br />
0,<br />
xn ( ) = 0 for n < 0 0,<br />
giving:<br />
n<br />
( ) ˆ<br />
⎛k⎞ xn = ∑ xkxn ( ) ( −k) ⎜ ⎟<br />
k = 0<br />
⎝n⎠ i separating out the term for k = n we get:<br />
n−1<br />
ˆ<br />
⎛k⎞ xn ( ) = ∑ xkxn ( ) ( − k) ⎜ ⎟+<br />
x( 0)<br />
xn ˆ(<br />
)<br />
k = 0<br />
⎝n⎠ n−1<br />
xn ( ) xn ( − k) ˆ( ) ˆ<br />
⎛k⎞ xn = − ∑ xk ( ) , > 0<br />
( 0) = 0 ( 0)<br />
⎜ ⎟ n<br />
x k x ⎝n⎠ xˆ( 0) = log x( 0) , xˆ( n) = 0, n < 0<br />
[ ]<br />
Cepstrum for Maximum Phase Signals<br />
• for maximum phase signals (no poles or<br />
zeros inside unit circle)<br />
xn ˆ( ) + xˆ( −n)<br />
cn ( ) =<br />
2<br />
• giving<br />
xn ˆ(<br />
) = 0 n><br />
0<br />
= cn ( ) n=<br />
0<br />
= 2cn ( ) n<<br />
0<br />
• thus the complex cepstrum (for maximum<br />
phase signals) can be computed<br />
by computing<br />
the cepstrum and using the equation above<br />
Computing Short-Time<br />
Short Time<br />
Cepstrums from Speech<br />
U Using i Polynomial P l i l Roots R t<br />
61<br />
63<br />
65<br />
Recursive Relation for Complex<br />
Cepstrum for Minimum Phase Signals<br />
i why is xˆ( 0) = log[ x(<br />
0)]?<br />
i assume we have a finite sequence xn ( ), n= 01 , ,..., N−1<br />
i we can write xn ( ) as:<br />
xn ( ) = x( 0) δ( n) + x( 1) δ( n− 1) + ... + xN ( −1) δ(<br />
N−1)<br />
⎡ x() 1 x( N −1)<br />
⎤<br />
= x( 0) ⎢δ( n) + δ( n− 1) + ... + δ(<br />
N −1)<br />
⎣ x( 0) x(<br />
0)<br />
⎥<br />
⎦<br />
i taking z − transforms, we get:<br />
N−1<br />
NZ1 NZ 2<br />
−n−1 Xz ( ) = ∑ xnz ( ) = G∏( 1−az k ) ∏(<br />
1−bz<br />
k )<br />
n= 0 k= 1 k=<br />
1<br />
i where the first term is the gain, G = x(<br />
0),<br />
and the two product terms are the<br />
zeros inside and outside the<br />
unit circle.<br />
i for minimum phase systems we have all zeros inside the unit circle so the<br />
second product term is gone, and we have the result that<br />
xˆ( 0) = log[ G] = log[ x( 0)]; xˆ( n) = 0, n < 0<br />
NZ1 a<br />
xn ˆ( ) =<br />
0<br />
k=1<br />
n ⎛ ⎞ k − ∑⎜<br />
⎟ n ><br />
⎝ n ⎠<br />
Recursive Relation for Complex<br />
Cepstrum for Maximum Phase Signals<br />
• the complex cepstrum for maximum phase signals<br />
can be computed recursively from the input signal,<br />
xn ( ) using the relation<br />
xn ˆ( ) = 0 n><br />
0<br />
= log ⎡⎣x( 0) ⎤⎦<br />
n = 0<br />
xn ( )<br />
= −<br />
x( 0) 0<br />
⎛k⎞ xn ( − k)<br />
∑ ⎜ ⎟xk<br />
ˆ( )<br />
⎝n⎠ x(<br />
0)<br />
n<<br />
0<br />
k= n+<br />
1<br />
Cepstrum From Polynomial Roots<br />
62<br />
64<br />
66<br />
11
Cepstrum From Polynomial Roots<br />
Practical Considerations<br />
• window to define short-time analysis<br />
• window duration (should be several pitch<br />
periods long)<br />
•size i of f FFT (to (t minimize i i i aliasing) li i )<br />
• elimination of linear phase components<br />
(positioning signals within frames)<br />
• cutoff quefrency of lifter<br />
• type of lifter (low/high quefrency)<br />
Voiced Speech Example<br />
67<br />
69<br />
Hamming window<br />
40 msec duration<br />
(section beginning<br />
at sample 13000<br />
in file<br />
test_16k.wav)<br />
71<br />
Computing Short-Time<br />
Short Time<br />
Cepstrums from Speech<br />
Using the DFT<br />
Computational Considerations<br />
Voiced Speech Example<br />
68<br />
70<br />
wrapped phase<br />
unwrapped<br />
phase<br />
72<br />
<strong>12</strong>
Voiced Speech Example<br />
Complex Cepstrum of Speech<br />
• model of speech:<br />
– voiced speech produced by a quasi-periodic<br />
pulse train exciting slowly time-varying linear<br />
system y => p[n] p[ ] convolved with hv[n] v[ ]<br />
– unvoiced speech produced by random noise<br />
exciting slowly time-varying linear system =><br />
u[n] convolved with hv[n] • time to examine full model and see what<br />
the complex cepstrum of speech looks like<br />
Voiced Speech Example<br />
73<br />
75<br />
Cepstrally<br />
smoothed log<br />
magnitude, 50<br />
quefrencies<br />
cutoff<br />
Cepstrally<br />
unwrapped<br />
phase, 50<br />
quefrencies<br />
cutoff<br />
77<br />
Characteristic System for<br />
Homomorphic Convolution<br />
• still need to define (and design) the L<br />
operator part (the linear system<br />
component) p ) of the system y to completely p y<br />
define the characteristic system for<br />
homomorphic convolution for speech<br />
– to do this properly and correctly, need to look<br />
at the properties of the complex cepstrum for<br />
speech signals<br />
Homomorphic Filtering of Voiced<br />
Speech<br />
• goal is to separate out the<br />
excitation impulses from the<br />
remaining components of the<br />
complex cepstrum<br />
• use cepstral window, l(n), to<br />
separate excitation pulses from<br />
combined vocal tract<br />
– l(n)=1 for |n|
Voiced Speech Example<br />
High quefrency<br />
liftering; cutoff<br />
quefrency=50;<br />
log magnitude<br />
and unwrapped<br />
phase<br />
Unvoiced Speech Example<br />
79<br />
Hamming window<br />
40 msec duration<br />
(section beginning<br />
at sample 3200 in<br />
file test_16k.wav)<br />
Unvoiced Speech Example<br />
81<br />
83<br />
Voiced Speech Example<br />
Estimated<br />
excitation<br />
function for<br />
voiced speech<br />
(Hamming<br />
window<br />
weighted)<br />
Unvoiced Speech Example<br />
80<br />
wrapped phase<br />
unwrapped<br />
phase<br />
Unvoiced Speech Example<br />
82<br />
Cepstrally<br />
smoothed log<br />
magnitude, 50<br />
quefrencies<br />
cutoff<br />
Cepstrally<br />
unwrapped<br />
phase, 50<br />
quefrencies<br />
cutoff<br />
84<br />
14
Unvoiced Speech Example<br />
Estimated<br />
excitation source<br />
for unvoiced<br />
speech section<br />
(Hamming<br />
window<br />
weighted)<br />
Review of Cepstral Calculation<br />
• 3 potential methods for computing cepstral<br />
coefficients, xn ˆ[ ] , of sequence x[n]<br />
– analytical method; assuming X(z) is a rational<br />
function; find poles and zeros and expand using log<br />
power series i<br />
– recursion method; assuming X(z) is either a minimum<br />
phase (all poles and zeros inside unit circle) or<br />
maximum phase (all poles and zeros outside unit<br />
circle) sequence<br />
– DFT implementation; using windows, with phase<br />
unwrapping (for complex cepstra)<br />
Cepstral Computation Aliasing<br />
i Effect of quefrency aliasing via a simple example<br />
xn [ ] = δ[ n] + αδ[<br />
n−Np] i with discrete-time Fourier transform<br />
j ( ) ω<br />
j N<br />
X e 1 e ω<br />
α −<br />
= +<br />
p<br />
i We can express the complex logarithm as<br />
ˆ j<br />
j Np<br />
X( e ) log{1 e }<br />
ω<br />
ω<br />
α −<br />
∞ m+ 1 m ⎛( −1)<br />
α ⎞<br />
= + = ∑⎜<br />
⎟e<br />
m=<br />
1⎝<br />
m ⎠<br />
i giving a complex cepstrum in the form<br />
∞ m+ 1 m ⎛ ⎞<br />
( −1)<br />
α<br />
xn ˆ[ ] = ∑⎜<br />
⎟δ[<br />
n−mNp] m=<br />
1⎝<br />
m ⎠<br />
− jωmNp 85<br />
87<br />
89<br />
Short Short-Time Time Homomorphic Analysis<br />
STFT<br />
Example 11—single<br />
single pole sequence<br />
(computed using all 3 methods)<br />
[ ] = [ ]<br />
n<br />
xn aun<br />
n<br />
a<br />
xn ˆ[ ] = un [ −1]<br />
n<br />
Example 22—voiced<br />
voiced speech frame<br />
86<br />
88<br />
90<br />
15
Example 33—low<br />
low quefrency liftering<br />
Example 4—effects 4 effects of low quefrency lifter<br />
Example 66—phase<br />
phase unwrapping<br />
91<br />
93<br />
95<br />
Example 33—high<br />
high quefrency liftering<br />
Example 55—phase<br />
phase unwrapping<br />
Homomorphic Spectrum Smoothing<br />
92<br />
94<br />
96<br />
16
Running Cepstrum<br />
Running Cepstrums<br />
Cepstrum Distance Measures<br />
i The cepstrum forms a natural basis for comparing<br />
patterns in speech recognition or vector quantization<br />
because of its stable mathematical characterization<br />
for speech signals<br />
i A typical "cepstral distance<br />
measure" is of the form:<br />
nco<br />
2<br />
D = ([] cn −cn<br />
[])<br />
∑<br />
n=<br />
1<br />
where cn [ ] and cn [ ] are cepstral sequences corresponding<br />
to frames of signal, and D is the cepstral distance between<br />
the pair of sequences.<br />
i Using Parseval's theorem, we can express the cepstral<br />
distance in the frequency domain as:<br />
1 π<br />
jω jω<br />
2<br />
D (log | H( e ) | log | H( e ) |) dω<br />
2π<br />
π<br />
Thus we see that the cepstral distance is actually a log<br />
magnitude spe<br />
−<br />
= ∫<br />
−<br />
i<br />
ctral distance<br />
97<br />
99<br />
101<br />
Running Cepstrum<br />
Cepstrum Applications<br />
Mel Frequency Cepstral Coefficients<br />
i Basic idea is to compute a frequency analysis based on a filter<br />
bank with approximately critical band spacing of the filters and<br />
bandwidths. For 4 kHz bandwidth, approximately 20 filters are used.<br />
i First perform a short-time Fourier analysis, giving Xm[ k], k = 0,1,..., NF / 2<br />
where m is the frame number and k is the frequency index (1 to half<br />
the size of the FFT)<br />
i Next the DFT values are grouped g p together g in critical bands and weighted g<br />
by triangular weighting functions.<br />
98<br />
100<br />
102<br />
17
Mel Frequency Cepstral Coefficients<br />
th th<br />
i The mel-spectrum of the m frame for the r filter ( r = 1, 2,..., R)<br />
is defined as:<br />
Ur<br />
1<br />
2<br />
MFm[]<br />
r = ∑ | Vr[] k Xm[]| k<br />
Ar<br />
k= Lr<br />
th<br />
where Vr[ k] is the weighting function for the r filter, ranging from<br />
DFT index Lr to Ur,<br />
and<br />
U Ur<br />
2<br />
Ar = ∑ | Vr[ k]|<br />
k= Lr<br />
th<br />
is the normalizing factor for the r mel-filter. (Normalization guarantees<br />
that if the input spectrum is flat, the mel-spectrum is flat).<br />
i A discrete cosine transform of the log magnitude of the filter outputs is<br />
computed to form the function mfcc [ n]<br />
as:<br />
R 1 ⎡2π⎛ 1⎞<br />
⎤<br />
mfccm[ n] = ∑log(<br />
MFm[<br />
r]) cos r n , n 1,2,..., Nmfcc<br />
R<br />
⎢ ⎜ + ⎟<br />
r=<br />
1<br />
R 2<br />
⎥ =<br />
⎣ ⎝ ⎠ ⎦<br />
i Typically N = 13 and R=<br />
24 for 4 kHz bandwidth speech signals. 103<br />
mfcc<br />
Homomorphic Vocoder<br />
• time-dependent complex cepstrum retains all the<br />
information of the time-dependent Fourier<br />
transform => exact representation of speech<br />
• time dependent real cepstrum loses phase<br />
information -> > not an exact representation of<br />
speech<br />
• quantization of cepstral parameters also loses<br />
information<br />
• cepstrum gives good estimates of pitch, voicing,<br />
formants => can build homomorphic vocoder<br />
Homomorphic Vocoder<br />
• l(n) is cepstrum window that selects low-time<br />
values and is of length 26 samples homomorphic<br />
vocoder<br />
105<br />
107<br />
Delta Cepstrum<br />
i The set of mel frequency cepstral coefficients provide perceptually<br />
meaningful and smooth estimates of speech spectra, over time<br />
i Since speech is inherently a dynamic signal, it is reasonable to seek<br />
a representation that includes some aspect of the dynamic nature of<br />
the time derivatives (both first and second order derivatives) of the shortterm<br />
cepstrum<br />
i The resulting parameter sets are called the<br />
delta cepstrum (first derivative)<br />
and the delta-delta cepstrum (second derivative).<br />
i Th The simplest i l t method th d of f computing ti ddelta lt cepstrum t parameters t is i a first fi t<br />
difference of cepstral vectors, of the form:<br />
Δ mfccm[ n] = mfccm[ n] −mfccm−1[<br />
n]<br />
i The simple difference is a poor approximation to the first derivative and is<br />
not generally used. Instead a least-squares approximation to the local slope<br />
(over a region<br />
around the current sample) is used, and is of the form:<br />
M<br />
∑ k( mfccm+<br />
k[<br />
n])<br />
k=−M Δ mfccm[<br />
n]<br />
=<br />
M<br />
2<br />
∑ k<br />
k=−M where the region is M frames before and after the current frame<br />
Homomorphic Vocoder<br />
1. compute cepstrum every 10-20 msec<br />
2. estimate pitch period and<br />
voiced/unvoiced decision<br />
3. quantize and encode low-time cepstral<br />
values l<br />
4. at synthesizer-get approximation to hv(n) or hu(n) from low time quantized cepstral<br />
values<br />
5. convolve hv(n) or hu(n) with excitation<br />
created from pitch, voiced/unvoiced, and<br />
amplitude information<br />
Homomorphic Vocoder Impulse Responses<br />
104<br />
106<br />
108<br />
18
Summary<br />
• Introduced the concept of the cepstrum of a signal,<br />
defined as the inverse Fourier transform of the log of the<br />
signal spectrum<br />
1<br />
j<br />
ˆ[ ] log ( )<br />
xn F X e ω<br />
− = ⎡<br />
⎣<br />
⎤<br />
⎦<br />
• Showed cepstrum p reflected pproperties p of both the<br />
excitation (high quefrency) and the vocal tract (low<br />
quefrency)<br />
– short quefrency window filters out excitation; long quefrency<br />
window filters out vocal tract<br />
• Mel-scale cepstral coefficients used as feature set for<br />
speech recognition<br />
• Delta and delta-delta cepstral coefficients used as<br />
indicators of spectral change over time 109<br />
19