LR Rabiner and RW Schafer, June 3

More documents

Recommendations

Info

DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 484CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING where ⇐⇒ denotes the unique relationship between a sequence and its DTFT. An interesting result can be obtained if we represent the complex cepstrum as ˆh[n] = c[n] + d[n], (8.112) where c[n] = Ev{ ˆ h[n]} is the even part and d[n] = Odd{ ˆ h[n]} is the odd part of the complex cepstrum. Recalling that the DTFT of the complex cepstrum is, by definition, ˆ H(e jω ) = log |H(e jω )| + j arg{H(e jω )}, it can be shown that the following DTFT relations hold: and nc[n] ⇐⇒ j d log |H(ejω )| , (8.113a) dω nd[n] ⇐⇒ − d arg{H(ejω )} . (8.113b) dω The DTFT expression on the right in (8.113b) is the group delay function [15] for H(ejω ); i.e., grd{H(e jω )} = − d arg{H(ejω )} . (8.114) dω Now if h[n] is assumed to be obtained by all-pole modeling as discussed in Section 8.6, the complex cepstrum satisfies ˆ h[n] = 0 for n < 0. This means that ˆh[n] = 2c[n] = 2d[n] for n > 0. If we define l[n] = n, then the liftered cepstrum distance D = ∞ m=−∞ |l[m]c[m] − l[m]¯c[m]| = is equivalent to either D = 1 π d log |H(e 2π jω )| dω or −π ∞ m=−∞ l[m]d[m] − l[m] ¯ d[m] (8.115a) − d log | ¯ H(e jω )| dω dω. (8.115b) D = 1 π grd{H(e 2π −π jω )} − grd{ ¯ H(e jω )} dω, (8.115c) The result of (8.115b) was also given by Tohkura [26]. Instead of l[n] = n for all n, or the lifter of (8.110), Itakura proposed the lifter l[n] = n s e −n2 /2τ 2 . (8.116) This lifter has great flexibility. For example, if s = 0 we have simply low quefrency liftering of the cepstrum. If s = 1 and τ is large, we have essentially l[n] = n for small n with high quefrency tapering. The effect of liftering with Eq. (8.116) is illustrated in Figure 8.40, which shows in (a) the short-time Fourier transform of a segment of voiced speech along with a linear predictive analysis spectrum with p = 12. In (b) is shown the liftered group delay spectrum for s = 1 and τ ranging from 5 to 30. Observe that as τ increases, the formant frequencies are increasingly emphasized. If larger values of s are used, even greater enhancement of the resonance structure is observed.
DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 8.7. CEPSTRUM DISTANCE MEASURES 485 log magnitude group delay 5 0 LPC spectrum Short−time Fourier spectrum −5 0 1 2 3 4 15 10 5 0 (a) LPC and Short-Time Fourier Spectrum (b) Liftered Group Delay Spectrum (s = 1) τ = 5 τ = 30 τ = 5 τ = 15 τ = 30 −5 0 1 2 3 4 frequency in kHz Figure 8.40: (a) Short-time Fourier transform and LPC spectrum (b) Liftered group delay spectrum. Itakura and Umezaki [7] tested the group delay spectrum distance measure in an automatic speech recognition system. They found that for clean test utterances, the difference in recognition rate was small for different values of s when τ ≈ 5 although performance suffered with increasing s for larger values of τ. This was attributed to the fact that for larger s the group delay spectrum becomes very sharply peaked and thus more sensitive to small differences in formant locations. However, in test conditions with additive white noise and also with linear filtering distortions, recognition rates improved significantly with τ = 5 and increasing values of the parameter s. 8.7.4 Mel-Frequency Cepstrum Coefficents As we have seen, weighted cepstrum distance measures have a directly equivalent interpretation in terms of log spectrum distance in the frequency domain. This is significant in light of models for human perception of sound, which are based upon a frequency analysis performed in the inner ear. With this in mind, Davis and Mermelstein [2] formulated a new type of cepstrum representation that has come to be widely used and known as the mel-frequency cepstrum coefficients (mfcc). The basic idea is to compute a frequency analysis based upon a filter bank with approximately critical band spacing of the filters and bandwidths. For 4 kHz bandwidth, approximately 20 filters are used. In most implementations,
Page 1 and 2:
DRAFT: L. R. Rabiner and R. W. Scha
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12: DRAFT: L. R. Rabiner and R. W. Scha
Page 61: DRAFT: L. R. Rabiner and R. W. Scha
Page 77: DRAFT: L. R. Rabiner and R. W. Scha
show all

LR Rabiner and RW Schafer, June 3

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?