LR Rabiner and RW Schafer, June 3

More documents

Recommendations

Info

DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 470CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING 4 2 0 −2 (a) Complex Cepstrum −4 −150 −100 −50 0 50 100 150 0.5 0 −0.5 (b) Cepstrum −150 −100 −50 0 Quefrency (Samples) 50 100 150 Figure 8.30: Homomorphic analysis of voiced speech; (a) Complex cepstrum ˜ˆx[n]; (b) Cepstrum ˜c[n]. (For comparison to Figure 8.27, the samples of ˜ ˆx[n] and ˜c[n] were reordered by placing the “negative quefrency” samples from the interval N/2 < n ≤ N − 1 before the samples in the range 0 ≤ n < N/2.) The plots in Figure 8.29a and 8.29b and the cepstrum plots in Figure 8.30 suggest how homomorphic filtering can be used to separate the excitation and vocal tract components. First, note that the impulses in the complex cepstrum due to the periodic excitation tend to be separated from the low quefrency components. This suggests that the appropriate system for short-time homomorphic filtering of speech is as depicted in Figure 8.31, which shows a segment of speech s[ n] X X wn [ ] D { } -1{ } ∗ xn [ ] xn ˆ[ ] ln [ ] D yn ˆ[ ] ∗ yn [ ] Figure 8.31: Implementation of a system for short-time homomorphic filtering of speech. selected by the window, w[n], with the complex cepstrum computed as discussed in Section 8.4. 17 The desired component of the input is selected by what might be termed a “cepstrum window”, denoted l[n]. This type of filtering is appropriately called “frequency-invariant linear filtering” since multiplying the complex 17 −1 For theoretical analysis, the operators D∗{·} and D∗ {·} would be represented in terms of the DTFT, but in practice, we would use the operators ˜ D∗{·} and ˜ D −1 ∗ {·} implemented using the DFT with N large enough to avoid aliasing in the cepstrum.
DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 8.5. HOMOMORPHIC FILTERING OF NATURAL SPEECH 471 cepstrum by l[n] corresponds to convolving its DTFT L(ejω ) with the complex logarithm ˆ X(ejω ) as in ˆY (e jω π ) = ˆX(e jθ )L(e j(ω−θ) )dθ. (8.95) −π This operation, which is simply linear filtering of the complex logarithm of the DTFT, was also called “liftering” by Bogert et al. [1], and therefore l[n] is often called a “lifter”. The resulting windowed complex cepstrum is processed by the inverse characteristic system to recover the desired component. This is illustrated by the thick lines in Figures. 8.29a and 8.29b which show the log magnitude and phase obtained in the process of implementing the inverse characteristic system (i.e., ˆ Y (ejω )) when l[n] is of the form ⎧ ⎪⎨ 1, |n| < nco llp[n] = 0.5 |n| = nco (8.96) ⎪⎩ 0, |n| > nco, where, in general, nco is chosen to be less than the pitch period, Np, and in this example, nco = 50 as shown in Figure 8.30a. 18 When using the DFT implementation, the lifter in Eq. (8.96) must conform to the sample ordering of the DFT, i.e., the negative quefrencies fall in the interval N/2 < n ≤ N −1 for an N-point DFT. Thus, for DFT implementations, the lowpass lifter has the form, ⎧ 1, 0 ≤ n < nco ⎪⎨ 0.5 n = nco ˜llp[n] = 0, nco < n < N − nco (8.97) 0.5 n = N − nco ⎪⎩ 1 N − nco < n ≤ N − 1. For simplicity, we shall henceforth define lifters in DTFT form as in Eq. (8.96), recognizing that the DFT form is always obtained by the process that yielded Eq. (8.97). The thick lines that are superimposed on the plots of log |X(e jω )| and arg{X(e jω )} in Figures 8.29a and 8.29b show the real and imaginary parts of ˆ Y (e jω ) corresponding to the liftered complex cepstrum ˆy[n] = llp[n]ˆx[n]. By comparing these plots to the corresponding plots with thin lines in Figures 8.29b and 8.29c respectively, it can be seen that ˆ Y (e jω ) is a lowpass filtered (smoothed) version of ˆ X(e jω ). The result of the lowpass liftering is to remove the effect of the excitation in the short-time Fourier transform. That is, retaining only the low quefrency components of the complex cepstrum is a way of estimating ˆ HV (e jω ) = log |HV (e jω )| + j arg{HV (e jω )}, the complex logarithm of the frequency response of the vocal tract system. We see that the smoothed log magnitude function in Figure 8.29a clearly displays formant resonances at about 500, 1500, 2250, and 3100 Hz. Also note that if the lifter llp[n] is applied 18 A one-sample transition is included in Eq. (8.96). Expanding or omitting this transition completely usually has little effect.
Page 1 and 2: DRAFT: L. R. Rabiner and R. W. Scha
Page 47: DRAFT: L. R. Rabiner and R. W. Scha
Page 77: DRAFT: L. R. Rabiner and R. W. Scha

LR Rabiner and RW Schafer, June 3

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?