LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.7. CEPSTRUM DISTANCE MEASURES 487<br />
log magnitude<br />
3<br />
2<br />
1<br />
0<br />
−1<br />
−2<br />
−3<br />
−4<br />
−5<br />
Short-time Fourier Transform<br />
Homomorphic smoothing, nco = 13<br />
−6<br />
LPC smoothing, p = 12<br />
Mel cepstrum smoothing, Nmfcc = 13<br />
−7<br />
0 500 1000 1500 2000 2500 3000 3500 4000<br />
frequency in Hz<br />
Figure 8.42: Comparison of spectral smoothing methods to mel-frequency analysis.<br />
them is a spectrum reconstructed by interpolation at the original DFT frequencies.<br />
Note that these spectra are different from one another in detail, but they<br />
have, in common, peaks at the formant resonances. At higher frequencies, the<br />
reconstructed mel-spectrum, of course, has more smoothing due to the structure<br />
of the filter bank.<br />
The mfcc parameters have become firmly established as the basic feature<br />
vector for many speech <strong>and</strong> acoustic pattern recognition problems. For this reason,<br />
new <strong>and</strong> efficient ways of computing mfcc[n] are of interest. An intriguing<br />
proposal is to use floating gate electronic technology to implement the filter<br />
bank <strong>and</strong> the DCT computation with microwatts of power [23].<br />
8.7.5 Dynamic Cepstral Features<br />
The set of mel frequency cepstral coefficients (mfcc) provide perceptually meaningful<br />
<strong>and</strong> smooth estimates of the speech spectra over time, <strong>and</strong> have been used<br />
effectively in a range of speech processing systems [18]. Since speech is inherently<br />
a dynamic signal, changing regularly in time, it is reasonable to seek a<br />
representation that includes some aspect of the dynamic nature of the speech<br />
signal. As such, Furui [4] proposed use of estimates of the time derivatives (both<br />
first <strong>and</strong> second order derivatives) of the short-term cepstrum. Furui called the