LR Rabiner and RW Schafer, June 3

More documents

Recommendations

Info

DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 480CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING form H(z) = G A(z) = 1 − G p αkz −k = k=1 G p (1 − zkz −1 . (8.100) ) The optimal values of the parameters G and αk for k = 1, 2, . . . , p can be found by solving a set of linear equations whose coefficients are determined by the autocorrelation function of the windowed speech segment x[n]. The parameters zk are simply the roots of the denominator polynomial A(z) (poles of H(z)), which can be found by polynomial rooting. The corresponding impulse response satisfies the difference equation h[n] = k=1 p αkh[n − k] + Gδ[n]. (8.101) k=1 An important feature of the all-pole model produced by linear predictive techniques is that the model is minimum phase; i.e., all the poles satisfy |zk| < 1. This means that the complex cepstrum of the impulse response of the all-pole model has the property ˆ h[n] = 0 for n < 0. The complex cepstrum (and therefore its even part, the cepstrum) can be determined in several ways. One approach recognizes that Eq. (8.100) is a special case of Eq. (8.19) with Mi = Mo = 0 and Ni = p. Using Eq. (8.23), it follows that the complex cepstrum of the impulse response of the minimum-phase all-pole model is ⎧ 0 n < 0 ⎪⎨ ˆh[n] log G n = 0 = p ⎪⎩ n > 0. k=1 z n k n (8.102) This method of computing the cepstrum of an all-pole model is depicted in the block diagram of Figure 8.39a. A second approach takes advantage of the recursion formula Eq. (8.82) of Section 8.4.3. Since h[n] is minimum phase, and since it follows from Eq. (8.101) that h[0] = G, Eq. (8.82) becomes ⎧ 0 n < 0 ⎪⎨ log G n = 0 ˆh[n] = h[n] ⎪⎩ G − n−1 (8.103) k ˆh[k] h[n − k] n > 0. n G k=0 As depicted in Figure 8.39b linear predictive analysis can provide the parameters of the difference equation in Eq. (8.101), which can in turn be used to compute any number of samples of the impulse response of the all-pole model. Then Eq. (8.103) can be used to compute the complex cepstrum for as many samples as the available impulse response samples. If it is desired to go from the complex cepstrum of the impulse response of the minimum-phase model back to the impulse response itself, we need only rearrange the terms in Eq. (8.103) to
DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 8.7. CEPSTRUM DISTANCE MEASURES 481 x[n] x[n] Linear Predictive Analysis Linear Predictive Analysis H ( z) = Factor zk Employ hˆ [ n] G A( z) A(z) (8.102) (a) H ( z) = Compute h[n] Employ hˆ [ n] h[n] Recursion G (8.101) (8.103) A( z) (b) Figure 8.39: Computation of the complex cepstrum of the impulse response of an all-pole minimum-phase model of the vocal tract system.; (a) polynomial rooting of the denominator of the all-pole system function, (b) recursive computation using Eq. (8.103). (Numbers in parenthesis refer to text equations.) obtain ⎧ 0 n < 0 ⎪⎨ G n = 0 h[n] = ⎪⎩ Gˆ n−1 k h[n] + ˆh[k]h[n − k] n > 0. n k=0 (8.104) This method of computation of the complex cepstrum of the all-pole vocal tract model relies on the linear predictive analysis to to remove the effects of the excitation. By restricting p to be much less than the pitch period Np, linear predictive modeling accomplishes what the lowpass lifter accomplishes in homomorphic filtering. 8.7 Cepstrum Distance Measures Perhaps the most pervasive application of the cepstrum in speech processing is its use in pattern recognition problems such as vector quantization (VQ) and automatic speech recognition (ASR). In such applications, a speech signal is represented on a frame-by-frame basis by a sequence of short-time cepstrums. In later discussions in this section, it will be useful to use somewhat more complicated notation. Specifically, we denote the cepstrum of the m th frame of a signal xm[n] as c (x) m [n], where n denotes the quefrency index of the cepstrum. In cases where it is not necessary to distinguish between signals or frames, these additional designations will be omitted as we have done up to this point in this chapter. Cepstrum-like representations can be obtained in many ways as we have seen. No matter how it is computed, we can assume that the cepstrum vector corresponds to a gain-normalized (c[0] = 0) minimum-phase vocal tract impulse
Page 1 and 2:
DRAFT: L. R. Rabiner and R. W. Scha
Page 3 and 4:
Page 5 and 6:
Page 7 and 8: DRAFT: L. R. Rabiner and R. W. Scha
Page 57: DRAFT: L. R. Rabiner and R. W. Scha
Page 77: DRAFT: L. R. Rabiner and R. W. Scha
show all

LR Rabiner and RW Schafer, June 3

Create successful ePaper yourself

Delete template?

Save as template?