LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.3. HOMOMORPHIC ANALYSIS OF THE SPEECH MODEL 449<br />
r ^ [ n ]<br />
g ^ [ n ]<br />
2<br />
1<br />
0<br />
−1<br />
−2<br />
(a) Glottal Pulse Complex Cepstrum<br />
−5 0<br />
quefrency nT in ms<br />
5<br />
0<br />
−0.2<br />
−0.4<br />
−0.6<br />
−0.8<br />
(c) Radiation Load Complex Cepstrum<br />
−1<br />
−5 0<br />
quefrency nT in ms<br />
5<br />
v ^ [ n ]<br />
p ^ [ n ]<br />
0.2<br />
0.1<br />
0<br />
−0.1<br />
−0.2<br />
−0.3<br />
−0.4<br />
(b) Vocal Tract Complex Cepstrum<br />
−0.5<br />
−5 0<br />
quefrency nT in ms<br />
5<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
(d) Voiced Excitation Complex Cepstrum<br />
−0.2<br />
−5 0 5 10 15 20 25<br />
quefrency nT in ms<br />
Figure 8.17: Complex cepstra of the speech model: (a) Glottal pulse ˆg[n], (b)<br />
Vocal tract impulse response ˆv[n], (c) Radiation load impulse response ˆr[n], <strong>and</strong><br />
(d) Periodic excitation ˆp[n].<br />
from which it follows that<br />
ˆp[n] =<br />
∞<br />
k=1<br />
β k<br />
k δ[n − kNp]. (8.46)<br />
As seen in Figure 8.17(d), the spacing between impulses in the complex cepstrum<br />
due to the input p[n] is Np = 80 samples, corresponding to a pitch period of<br />
1/F0 = 80/10000 = 8 ms. Note that in Figure 8.17 we have shown the discrete<br />
quefrency index in terms of ms, i.e., the horizontal axis shows nT .<br />
According to Eq. (8.44), the complex cepstrum of the synthetic speech output<br />
is the sum of all of the complex cepstra in Figure 8.17. Thus, ˆs[n] =<br />
ˆhV [n] + ˆp[n] is depicted in Figure 8.18(a). The cepstrum, being the even part<br />
of ˆs[n] is depicted in Figure 8.18(b). Note that in both cases, the impulses due<br />
to the periodic excitation tend to st<strong>and</strong> out from the contributions due to the<br />
system impulse response. The location of the first impulse peak is at quefrency<br />
Np, which is the period of the excitation. This is the basis for the use of the<br />
cepstrum or complex cepstrum for pitch detection; i.e., the presence of a strong<br />
peak signals voiced speech, <strong>and</strong> its quefrency is an estimate of the pitch period.<br />
Finally, it is worthwhile to connect the z-transform analysis employed in<br />
this example to the discrete-time Fourier transform representation of the complex<br />
cepstrum. This is depicted in Figures 8.19(a) <strong>and</strong> 8.19(b) which show