LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.5. HOMOMORPHIC FILTERING OF NATURAL SPEECH 475<br />
Amplitude<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
(a) Zero−Phase Impulse Response Estimate<br />
−200 −150 −100 −50 0 50 100 150 200<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
(b) Minimum−Phase Impulse Response Estimate<br />
−200 −150 −100 −50 0 50 100 150 200<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
(c) Maximum−Phase Impulse Response Estimate<br />
−200 −150 −100 −50 0<br />
Time (Samples)<br />
50 100 150 200<br />
Figure 8.34: Homomorphic filtering of voiced speech; (a) Zero-phase estimate<br />
of hV [n]; (b) Minimum-phase estimate of hV [n]; (c) Maximum-phase estimate<br />
of hV [n];.<br />
8.5.5 Unvoiced Speech Analysis using the DFT<br />
To complete the illustration of homomorphic analysis of natural speech, consider<br />
the example of unvoiced speech given in Figure 8.35. Figure 8.35a shows a<br />
waveform segment of the fricative /SH/ multiplied by a 401-point Hamming<br />
window. The rapidly varying curve plotted with the thin line in Figure 8.35b<br />
is the corresponding log magnitude function log |X(e jω )|. Figure 8.35c shows<br />
the corresponding cepstrum c[n]. For consistency, <strong>and</strong> since we generally do not<br />
know in advance whether a particular speech segment is voiced or unvoiced, c[n]<br />
for unvoiced speech is computed as the inverse Fourier transform of log |X(e jω )|<br />
just as for voiced speech. Note the erratic variation of the log magnitude function<br />
(log periodogram). It is clear from Figure 8.35c that, in contrast to the case<br />
of voiced speech, the cepstrum of an unvoiced speech segment does not display<br />
any sharp peaks in the high quefrency region. Instead, the high quefrencies<br />
represent the rapid r<strong>and</strong>om fluctuations in Figure 8.35b. However, the low-time<br />
portion of the cepstrum can still be assumed to represent log |HU (e jω )|. This<br />
is illustrated in Figure 8.35b by the smooth curve plotted with the thick line,<br />
which represents the smoothed log magnitude function obtained by applying the<br />
lowpass cepstrum window of Eq. (8.96) to the cepstrum of Figure 8.35c with