LR Rabiner and RW Schafer, June 3

More documents

Recommendations

Info

DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 458CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING 1 0.5 0 (a) Aliased Complex Cepstrum of δ[n]+0.8 δ[n−75] −0.5 0 44 75 119 150 194 225 256 quefrency n 1 0.5 0 (b) Aliased Cepstrum of δ[n]+0.8 δ[n−75] −0.5 0 31 75 106 128 150 181 225 256 quefrency n Figure 8.24: Quefrency-aliased (a) complex cepstrum; and (b) the cepstrum. Circled dots are cepstrum values in correct locations. ˆX[k] = log{X[k]} = log{X(e j2πk/N )}), then the resulting N-point sequence ˜ ˆx[n] will be given by Eq. (8.58) with Eq. (8.65) substituted for ˆx[n]. Note the sequence ˆx[n] is non-zero for n = mNp for 1 ≤ m < ∞, so aliasing will produce complex cepstrum values that are out of squence. In fact, we can show that the non-zero values of ˜ ˆx[n] are at positions ((mNp))N for all positive integers m. 10 This is illustrated in Figure 8.24a for the case N = 256, Np = 75, and α = 0.8. Observe that since 3Np < N < 4Np for the specific values N = 256 and Np = 75, ˆx[Np], ˆx[2Np], and ˆx[3Np] are in their correct positions, but values of ˆx[n] for n ≥ 4Np “wrap around” into the base interval 0 ≤ n ≤ N − 1. Since ˆx[n] → 0 for n → ∞, increasing N will tend to mitigate the effect by allowing more of the impulses to be at their correct position, and, at the same time, ensuring that the aliased samples will have smaller amplitudes due to the 1/|n| fall off. As discussed in Refs. [13, 20, 15, 27] a large value for N (that is, a high rate of sampling of the Fourier transform) is also required for accurate computation of the complex logarithm. However, the use of fast Fourier transform (FFT) algorithms makes it feasible to use reasonably large values of N such as N = 1024 or N = 2048, so quefrency aliasing need not be a significant problem. Figure 8.24b shows the aliasing effects for the cepstrum, which in this example is the periodic even part of Figure 8.24a. Since the cepstrum is non-zero for both positive and negative n, the implicit periodicity of the DFT representation causes the negative quefrency samples to be located at positions ((N − n))N. 10 Following the notation in [15], ((mNp))N means mNp taken modulo N.
DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 8.4. COMPUTING THE SHORT-TIME CEPSTRUM AND COMPLEX CEPSTRUM OF SPEECH459 Therefore, in this example, only the two circled samples at n = Np = 75 and n = N − Np = 256 − 75 = 181 are in what could be considered their “correct” locations (assuming samples at 128 < n < 256 to be “negative quefrency” samples). Again, by increasing N, we can mitigate the aliasing effects since the cepstrum approaches zero for large n. 8.4.2 Computation Based on the z-Transform An approach to computing the complex cepstrum of finite-length sequences that does not require phase unwrapping is suggested by the example of Section 8.3. In that example, the z-transforms of all the convolved components of the model had closed-form expressions as rational functions. By factoring the numerator and denominator polynomials, it was possible to compute the complex cepstrum exactly. Short-time analysis of natural speech signals is based upon finite-length (windowed) segments of the speech waveform, and if a sequence x[n] has finite length, then its z-transform is a polynomial in z −1 of the form X(z) = M x[n]z −n . (8.66a) n=0 Such an M th -order polynomial in z −1 can be represented in terms of its roots as Mi X(z) = x[0] (1 − amz −1 Mo ) (1 − b −1 m=1 m=1 m z −1 ), (8.66b) where the quantities am are the (complex) zeros that lie inside the unit circle (minimum-phase part) and the quantities b −1 m are the zeros that are outside the unit circle (maximum-phase part); i.e., |am| < 1 and |bm| < 1. We assume that no zeros lie precisely on the unit circle. 11 If we factor a term −b −1 m z −1 out of each factor of the product at far right in Eq. (8.66b), then Eq. (8.66b) can be expressed as where X(z) = Az −Mo Mi (1 − amz −1 Mo ) (1 − bmz). (8.66c) m=1 A = x[0](−1) Mo Mo m=1 m=1 b −1 m . (8.66d) This representation of a windowed frame of speech can be obtained by using a polynomial rooting algorithm to find the zeros am and b −1 m that lie inside and outside the unit circle, respectively for the polynomial whose coefficients are the sequence x[n]. 11 Perhaps not surprisingly, it is rare that a computed root of a polynomial is precisely on the unit circle; however, as previously mentioned, most of the zeros lie close to the unit circle for high-order polynomials.
Page 1 and 2: DRAFT: L. R. Rabiner and R. W. Scha
Page 35: DRAFT: L. R. Rabiner and R. W. Scha
Page 77: DRAFT: L. R. Rabiner and R. W. Scha

LR Rabiner and RW Schafer, June 3

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?