18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

486CHAPTER 8. THE CEPSTRUM AND HOMOMORPHIC SPEECH PROCESSING<br />

a short-time Fourier analysis is done first, resulting in a DFT, Xm[k], for the<br />

m th frame. Then the DFT values are grouped together in critical b<strong>and</strong>s <strong>and</strong><br />

weighted by triangular weighting functions such as those depicted in Fig. 8.41.<br />

Note that the b<strong>and</strong>widths in Fig. 8.41 are constant for center frequencies below<br />

0.01<br />

0.005<br />

0<br />

0 500 1000 1500 2000 2500 3000 3500 4000<br />

frequency in Hz<br />

Figure 8.41: DFT weighting functions for mel-frequency-cepstrum computations.<br />

1 kHz <strong>and</strong> then increase exponentially up to half the sampling rate of 4 kHz<br />

resulting in 24 “filters”. The mel-spectrum of the mth r = 1, 2, . . . , R as<br />

frame is defined for<br />

MFm[r] = 1<br />

Ur <br />

|Vr[k]Xm[k]| 2<br />

(8.117a)<br />

Ar<br />

k=Lr<br />

where Vr[k] is the weighting function for the the rth filter ranging from DFT<br />

index Lr to Ur, <strong>and</strong><br />

Ur <br />

Ar = |Vr[k]| 2<br />

(8.117b)<br />

k=Lr<br />

is a normalizing factor for the r th mel-filter. This normalization is built into the<br />

plot of Fig. 8.41. It is needed so that a perfectly flat input Fourier spectrum<br />

will produce a flat mel-spectrum. For each frame, a discrete cosine transform<br />

of the logarithm of the magnitude of the filter outputs is computed to form the<br />

function mfcc[n] as<br />

mfccm[n] = 1<br />

R<br />

R<br />

r=1<br />

<br />

2π<br />

log (MFm[r]) cos r +<br />

R<br />

1<br />

<br />

n . (8.118)<br />

2<br />

Typically, mfccm[n] is evaluated for n = 1, 2, . . . , Nmfcc, where Nmfcc is less than<br />

the number of mel-filters, e.g., Nmfcc = 13 <strong>and</strong> R = 24. Figure 8.42 shows the<br />

result of mfcc analysis of a frame of voiced speech in comparison with the shorttime<br />

spectrum, LPC spectrum, <strong>and</strong> a homomorphically smoothed spectrum. 21<br />

The large dots are the values of log (MFm[r]) <strong>and</strong> the line interpolated between<br />

21 The speech signal was pre-emphasized by convolution with δ[n] − 0.97δ[n − 1] prior to<br />

analysis so as to equalize the levels of the formant resonances.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!