LR Rabiner and RW Schafer, June 3

More documents

Recommendations

Info

DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 454CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING 8.4.1 Computation Based on the Discrete Fourier Transform Recall that in Chapter 7 we defined an alternative form of the short-time Fourier transform as L−1 ˜Xˆn(e j ˆω ) = n=0 w[n]x[ˆn + n]e −j ˆωn , (8.54) where ˆn denotes the analysis time and ˆω denotes a short-time analysis frequency. 8 That is, the short-time Fourier transform at analysis time ˆn is the discrete-time Fourier transform of the finite-length sequence xˆn[n] = w[n]s[ˆn + n] 0 0 ≤ n ≤ L − 1 otherwise (8.55) where s[n] denotes the speech signal and we assume that w[n] = 0 outside the interval 0 ≤ n ≤ L − 1. In this formulation, the time origin of the windowed segment is reset from ˆn to the origin of w[n]. In our basic definition in Chapter 7 (see Eq. (7.8)), the time origin of the window is shifted to the analysis time ˆn. This definition facilitates interpretation in terms of linear filtering and filter banks, but for cepstrum analysis, it is preferable to consider the window origin fixed at n = 0 with the signal samples to be analyzed being shifted into the window as in Eq. (8.55). This allows us to focus on the interpretation of the short-time Fourier transform as simply the discrete-time Fourier transform of the finite-length sequence xˆn[n]. Since each windowed segment will be processed independently by the techniques of homomorphic filtering, we can simplify our notation by dropping the subscript ˆn except where it is necessary to specify the analysis time. Furthermore, there will be no need to distinguish between the discrete-time Fourier transform variable ω and the specific short-time analysis frequency variable ˆω since we will focus on the DTFT interpretation. Therefore, the representations of the characteristic system for convolution and its inverse, depicted in Figs. 8.5 and 8.6 respectively, are the basis for a short-time homomorphic system for convolution if we simply note that the input is the finite-length windowed sequence x[n] = w[n]s[ˆn + n]. In other words, the short-time characteristic system for convolution is defined by the equations X(e jω ) = L−1 x[n]e −jωn n=0 (8.56a) ˆX(e jω ) = log{X(e jω )} = log |X(e jω )| + j arg{X(e jω )} (8.56b) ˆx[n] = 1 π ˆX(e 2π jω )e jωn dω. (8.56c) −π Equation (8.56a) is the discrete-time Fourier transform of the windowed input sequence defined by Eq. (8.55), Eq. (8.56b) is the complex logarithm of the discrete-time Fourier transform of the input, and Eq. (8.56c) is the inverse 8 This definition is identical to the alternate definition in Eq. 7.10 except that we have redefined the window by replacing w[−n] by w[n].
DRAFT: L. R. Rabiner and R. W. Schafer, June 3, 2009 8.4. COMPUTING THE SHORT-TIME CEPSTRUM AND COMPLEX CEPSTRUM OF SPEECH455 discrete-time Fourier transform of the complex logarithm of the Fourier transform of the input. As we have already observed, there are questions of uniqueness of this set of equations. In order to clearly define the complex cepstrum with Eqs. (8.56a)- (8.56c), we must provide a unique definition of the complex logarithm of the Fourier transform. To do this, it is helpful to impose the constraint that the complex cepstrum of a real input sequence be also a real sequence. Recall that for a real sequence the real part of the Fourier transform is an even function and the imaginary part is odd. Therefore, if the complex cepstrum is to be a real sequence, we must define the log magnitude function to be an even function of ω and the phase must be defined to be an odd function of ω. As we have already asserted, a further sufficient condition for the complex logarithm to be unique is that the phase be computed so that it is a continuous periodic function of ω with period of 2π [13, 20]. Algorithms for the computation of an appropriate phase function typically start with the principal value phase sampled at the DFT frequencies as a basis for search for discontinuities of size 2π. Due to the sampling, care must be taken to locate the frequencies at which the discontinuities occur. A simple approach, that generally works well if the phase is densely sampled, is to search for jumps (either positive or negative) of size greater than some prescribed tolerance. 9 Once the frequencies where the principal value “wraps around” are found, the appropriate multiples of 2π radians can be added or subtracted to produce the “unwrapped phase” [20, 27]. Another method of computing the phase is discussed in Section 8.4.2. Although Eqs. (8.56a)-(8.56c) can be useful for theoretical analysis,the are not in a form that is useful for computation, since Eq. (8.56c) requires the evaluation of an integral. However, we can approximate Eq. (8.56c) by using the discrete Fourier transform. The discrete Fourier transform (DFT) of a finite length sequence is identical to a sampled version of the discrete-time Fourier transform (DTFT) of that same sequence [15]; i.e., X[k] = X(e j2πk/N ). Furthermore, the discrete Fourier transform can be efficiently computed by a fast Fourier transform algorithm [15]. Thus, the approach that is suggested for computing the complex cepstrum is to replace all of the DTFT operations in Figure 8.4 by corresponding DFT operations. The resulting implementation of the characteristic system is depicted in Figure 8.21 and defined by the equations D ∗{ } ~ x[ n] X [k] Complex X ˆ [ k] DFT Log IDFT ~ xˆ [ n] Figure 8.21: Computation of the complex cepstrum using the DFT (implementation of the approximate inverse system for convolution ˜ D∗{·}). 9 The unwrap( ) function in Matlab uses a default tolerance of π, which is reasonable, since jumps of close to π radians can occur because of zeros that are very close to the unit circle.
Page 1 and 2: DRAFT: L. R. Rabiner and R. W. Scha
Page 31: DRAFT: L. R. Rabiner and R. W. Scha
Page 77: DRAFT: L. R. Rabiner and R. W. Scha

LR Rabiner and RW Schafer, June 3

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?