03.05.2014 Views

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

22 2 Audio-based <strong>Similarity</strong> Measures<br />

1<br />

DCT Matrix<br />

Mel Frequency Cepstral Coefficients Reconstructed<br />

80<br />

mfcc<br />

M_dB_rec<br />

60<br />

40<br />

20<br />

1 36<br />

0<br />

1 20<br />

20<br />

0<br />

1 10 20 30 36<br />

Figure 2.7: DCT matrix, MFCCs, <strong>and</strong> reconstructed Mel power spectrum (dB) for<br />

the audio signal used in Figure 2.3. High values in the DCT matrix are visualized<br />

as black. The dimensions <strong>of</strong> the DCT matrix are DCT coefficients on the y-axis<br />

<strong>and</strong> Mel frequency b<strong>and</strong>s on the x-axis.<br />

2.2.2.4 DCT<br />

The Discrete Cosine Transform is applied to compress the Mel power spectrum.<br />

In particular, the num_filt (e.g. 36) frequency b<strong>and</strong>s are represented<br />

by num_coeffs (e.g. 20) coefficients. Alternatives include, e.g., the Principial<br />

Component Analysis. A side effect <strong>of</strong> the compression is that the spectrum<br />

is smoothed along the frequency axis which can be interpreted as a simple<br />

approximation <strong>of</strong> the spectral masking in the human auditory system.<br />

The DCT matrix can be computed as follows. 11<br />

01 num_coeffs = 20; (2.8)<br />

02<br />

03 DCT = 1/sqrt(num_filt/2) * ...<br />

04 cos((0:num_coeffs-1)’*(0.5:num_filt)*pi/num_filt);<br />

05 DCT(1,:) = DCT(1,:)*sqrt(2)/2;<br />

Thus the DCT matrix has num_coeffs rows <strong>and</strong> num_filt columns. Each <strong>of</strong><br />

the rows corresponds to an eigenvector, starting with the most important one<br />

(highest eigenvalue) in the first row. The first eigenvector describes the mean<br />

<strong>of</strong> the spectrum. The second describes a spectral pattern with high energy<br />

in the lower half <strong>of</strong> the frequencies <strong>and</strong> low energy in the upper half. The<br />

eigenvectors are orthogonal. The DCT is applied to the Mel power spectrum<br />

(in Decibel) as follows:<br />

mfcc = DCT * M_dB; (2.9)<br />

The effects <strong>of</strong> the DCT matrix on the Mel power spectrum are shown in<br />

Figure 2.7. The resulting MFCCs are a compressed representation <strong>of</strong> the<br />

11 This code is based on Malcolm Slaney’s Auditory Toolbox.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!