LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.7. CEPSTRUM DISTANCE MEASURES 485<br />
log magnitude<br />
group delay<br />
5<br />
0<br />
LPC spectrum<br />
Short−time Fourier spectrum<br />
−5<br />
0 1 2 3 4<br />
15<br />
10<br />
5<br />
0<br />
(a) LPC <strong>and</strong> Short-Time Fourier Spectrum<br />
(b) Liftered Group Delay Spectrum (s = 1)<br />
τ = 5<br />
τ = 30 τ = 5<br />
τ = 15<br />
τ = 30<br />
−5<br />
0 1 2 3 4<br />
frequency in kHz<br />
Figure 8.40: (a) Short-time Fourier transform <strong>and</strong> LPC spectrum (b) Liftered<br />
group delay spectrum.<br />
Itakura <strong>and</strong> Umezaki [7] tested the group delay spectrum distance measure<br />
in an automatic speech recognition system. They found that for clean test<br />
utterances, the difference in recognition rate was small for different values of s<br />
when τ ≈ 5 although performance suffered with increasing s for larger values of<br />
τ. This was attributed to the fact that for larger s the group delay spectrum<br />
becomes very sharply peaked <strong>and</strong> thus more sensitive to small differences in<br />
formant locations. However, in test conditions with additive white noise <strong>and</strong><br />
also with linear filtering distortions, recognition rates improved significantly<br />
with τ = 5 <strong>and</strong> increasing values of the parameter s.<br />
8.7.4 Mel-Frequency Cepstrum Coefficents<br />
As we have seen, weighted cepstrum distance measures have a directly equivalent<br />
interpretation in terms of log spectrum distance in the frequency domain. This<br />
is significant in light of models for human perception of sound, which are based<br />
upon a frequency analysis performed in the inner ear. With this in mind, Davis<br />
<strong>and</strong> Mermelstein [2] formulated a new type of cepstrum representation that has<br />
come to be widely used <strong>and</strong> known as the mel-frequency cepstrum coefficients<br />
(mfcc).<br />
The basic idea is to compute a frequency analysis based upon a filter bank<br />
with approximately critical b<strong>and</strong> spacing of the filters <strong>and</strong> b<strong>and</strong>widths. For 4<br />
kHz b<strong>and</strong>width, approximately 20 filters are used. In most implementations,