18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.7. CEPSTRUM DISTANCE MEASURES 487<br />

log magnitude<br />

3<br />

2<br />

1<br />

0<br />

−1<br />

−2<br />

−3<br />

−4<br />

−5<br />

Short-time Fourier Transform<br />

Homomorphic smoothing, nco = 13<br />

−6<br />

LPC smoothing, p = 12<br />

Mel cepstrum smoothing, Nmfcc = 13<br />

−7<br />

0 500 1000 1500 2000 2500 3000 3500 4000<br />

frequency in Hz<br />

Figure 8.42: Comparison of spectral smoothing methods to mel-frequency analysis.<br />

them is a spectrum reconstructed by interpolation at the original DFT frequencies.<br />

Note that these spectra are different from one another in detail, but they<br />

have, in common, peaks at the formant resonances. At higher frequencies, the<br />

reconstructed mel-spectrum, of course, has more smoothing due to the structure<br />

of the filter bank.<br />

The mfcc parameters have become firmly established as the basic feature<br />

vector for many speech <strong>and</strong> acoustic pattern recognition problems. For this reason,<br />

new <strong>and</strong> efficient ways of computing mfcc[n] are of interest. An intriguing<br />

proposal is to use floating gate electronic technology to implement the filter<br />

bank <strong>and</strong> the DCT computation with microwatts of power [23].<br />

8.7.5 Dynamic Cepstral Features<br />

The set of mel frequency cepstral coefficients (mfcc) provide perceptually meaningful<br />

<strong>and</strong> smooth estimates of the speech spectra over time, <strong>and</strong> have been used<br />

effectively in a range of speech processing systems [18]. Since speech is inherently<br />

a dynamic signal, changing regularly in time, it is reasonable to seek a<br />

representation that includes some aspect of the dynamic nature of the speech<br />

signal. As such, Furui [4] proposed use of estimates of the time derivatives (both<br />

first <strong>and</strong> second order derivatives) of the short-term cepstrum. Furui called the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!