18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.5. HOMOMORPHIC FILTERING OF NATURAL SPEECH 475<br />

Amplitude<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

(a) Zero−Phase Impulse Response Estimate<br />

−200 −150 −100 −50 0 50 100 150 200<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

(b) Minimum−Phase Impulse Response Estimate<br />

−200 −150 −100 −50 0 50 100 150 200<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

(c) Maximum−Phase Impulse Response Estimate<br />

−200 −150 −100 −50 0<br />

Time (Samples)<br />

50 100 150 200<br />

Figure 8.34: Homomorphic filtering of voiced speech; (a) Zero-phase estimate<br />

of hV [n]; (b) Minimum-phase estimate of hV [n]; (c) Maximum-phase estimate<br />

of hV [n];.<br />

8.5.5 Unvoiced Speech Analysis using the DFT<br />

To complete the illustration of homomorphic analysis of natural speech, consider<br />

the example of unvoiced speech given in Figure 8.35. Figure 8.35a shows a<br />

waveform segment of the fricative /SH/ multiplied by a 401-point Hamming<br />

window. The rapidly varying curve plotted with the thin line in Figure 8.35b<br />

is the corresponding log magnitude function log |X(e jω )|. Figure 8.35c shows<br />

the corresponding cepstrum c[n]. For consistency, <strong>and</strong> since we generally do not<br />

know in advance whether a particular speech segment is voiced or unvoiced, c[n]<br />

for unvoiced speech is computed as the inverse Fourier transform of log |X(e jω )|<br />

just as for voiced speech. Note the erratic variation of the log magnitude function<br />

(log periodogram). It is clear from Figure 8.35c that, in contrast to the case<br />

of voiced speech, the cepstrum of an unvoiced speech segment does not display<br />

any sharp peaks in the high quefrency region. Instead, the high quefrencies<br />

represent the rapid r<strong>and</strong>om fluctuations in Figure 8.35b. However, the low-time<br />

portion of the cepstrum can still be assumed to represent log |HU (e jω )|. This<br />

is illustrated in Figure 8.35b by the smooth curve plotted with the thick line,<br />

which represents the smoothed log magnitude function obtained by applying the<br />

lowpass cepstrum window of Eq. (8.96) to the cepstrum of Figure 8.35c with

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!