03.05.2014 Views

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

38 2 Audio-based <strong>Similarity</strong> Measures<br />

the end <strong>of</strong> this section there are some paragraphs describing illustrations<br />

which help underst<strong>and</strong> basic characteristics <strong>of</strong> the FPs.<br />

2.2.4.1 Details<br />

The parameters used are: segment size 128 (about 3 seconds), hop size 64<br />

(50% overlap). Instead <strong>of</strong> the 36 frequency b<strong>and</strong>s <strong>of</strong> the Mel spectrum only<br />

12 are used. The grouping <strong>of</strong> frequency b<strong>and</strong>s is described below. The<br />

resolution <strong>of</strong> the modulation frequency in the range <strong>of</strong> 0 to 10Hz is 30. This<br />

results in 360 (12×30) dimensional FPs.<br />

The main reason for reducing the frequency resolution is to reduce the<br />

overall dimensionality <strong>of</strong> the FPs, <strong>and</strong> thus the required memory to store<br />

one pattern. Furthermore, a high frequency resolution is not necessary. For<br />

example, in [Pam01] 20 frequency b<strong>and</strong>s were used. Analysis <strong>of</strong> the eigenvectors<br />

showed that especially higher frequency b<strong>and</strong>s are very correlated.<br />

In [Pam05] only 12 frequency b<strong>and</strong>s were used. In particular, the following<br />

mapping was used to group the 36 Mel b<strong>and</strong>s into 12 frequency b<strong>and</strong>s:<br />

01 t = zeros(1,36); (2.21)<br />

02 t(1) = 1; t( 7: 8) = 5; t(15:18) = 9;<br />

03 t(2) = 2; t( 9:10) = 6; t(19:23) = 10;<br />

04 t(3:4) = 3; t(11:12) = 7; t(24:29) = 11;<br />

05 t(5:6) = 4; t(13:14) = 8; t(30:36) = 12;<br />

06<br />

07 mel2 = zeros(12,size(M_dB,2));<br />

08 for i=1:12,<br />

09 mel2(i,:) = sum(M_dB(t==i,:),1);<br />

10 end<br />

The actual values are more or less arbitrary. However, they are based on<br />

the observation that interesting details are <strong>of</strong>ten in lower frequency b<strong>and</strong>s.<br />

Note that the energy is added up. Thus, the 12th frequency b<strong>and</strong> <strong>of</strong> mel2<br />

represents the sum <strong>of</strong> 7 M_db b<strong>and</strong>s while the first <strong>and</strong> second b<strong>and</strong> only<br />

represent one. This can be seen in Figure 2.16. In particular, the frequency<br />

b<strong>and</strong>s 9-11 have higher values.<br />

The following defines the constants used for computations, in particular<br />

the fluctuation strength weights (flux), the filter which smoothes over the<br />

frequency b<strong>and</strong>s (filt1), <strong>and</strong> the filter which smoothes over the modulation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!