18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.7. CEPSTRUM DISTANCE MEASURES 485<br />

log magnitude<br />

group delay<br />

5<br />

0<br />

LPC spectrum<br />

Short−time Fourier spectrum<br />

−5<br />

0 1 2 3 4<br />

15<br />

10<br />

5<br />

0<br />

(a) LPC <strong>and</strong> Short-Time Fourier Spectrum<br />

(b) Liftered Group Delay Spectrum (s = 1)<br />

τ = 5<br />

τ = 30 τ = 5<br />

τ = 15<br />

τ = 30<br />

−5<br />

0 1 2 3 4<br />

frequency in kHz<br />

Figure 8.40: (a) Short-time Fourier transform <strong>and</strong> LPC spectrum (b) Liftered<br />

group delay spectrum.<br />

Itakura <strong>and</strong> Umezaki [7] tested the group delay spectrum distance measure<br />

in an automatic speech recognition system. They found that for clean test<br />

utterances, the difference in recognition rate was small for different values of s<br />

when τ ≈ 5 although performance suffered with increasing s for larger values of<br />

τ. This was attributed to the fact that for larger s the group delay spectrum<br />

becomes very sharply peaked <strong>and</strong> thus more sensitive to small differences in<br />

formant locations. However, in test conditions with additive white noise <strong>and</strong><br />

also with linear filtering distortions, recognition rates improved significantly<br />

with τ = 5 <strong>and</strong> increasing values of the parameter s.<br />

8.7.4 Mel-Frequency Cepstrum Coefficents<br />

As we have seen, weighted cepstrum distance measures have a directly equivalent<br />

interpretation in terms of log spectrum distance in the frequency domain. This<br />

is significant in light of models for human perception of sound, which are based<br />

upon a frequency analysis performed in the inner ear. With this in mind, Davis<br />

<strong>and</strong> Mermelstein [2] formulated a new type of cepstrum representation that has<br />

come to be widely used <strong>and</strong> known as the mel-frequency cepstrum coefficients<br />

(mfcc).<br />

The basic idea is to compute a frequency analysis based upon a filter bank<br />

with approximately critical b<strong>and</strong> spacing of the filters <strong>and</strong> b<strong>and</strong>widths. For 4<br />

kHz b<strong>and</strong>width, approximately 20 filters are used. In most implementations,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!