Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
38 2 Audio-based <strong>Similarity</strong> Measures<br />
the end <strong>of</strong> this section there are some paragraphs describing illustrations<br />
which help underst<strong>and</strong> basic characteristics <strong>of</strong> the FPs.<br />
2.2.4.1 Details<br />
The parameters used are: segment size 128 (about 3 seconds), hop size 64<br />
(50% overlap). Instead <strong>of</strong> the 36 frequency b<strong>and</strong>s <strong>of</strong> the Mel spectrum only<br />
12 are used. The grouping <strong>of</strong> frequency b<strong>and</strong>s is described below. The<br />
resolution <strong>of</strong> the modulation frequency in the range <strong>of</strong> 0 to 10Hz is 30. This<br />
results in 360 (12×30) dimensional FPs.<br />
The main reason for reducing the frequency resolution is to reduce the<br />
overall dimensionality <strong>of</strong> the FPs, <strong>and</strong> thus the required memory to store<br />
one pattern. Furthermore, a high frequency resolution is not necessary. For<br />
example, in [Pam01] 20 frequency b<strong>and</strong>s were used. Analysis <strong>of</strong> the eigenvectors<br />
showed that especially higher frequency b<strong>and</strong>s are very correlated.<br />
In [Pam05] only 12 frequency b<strong>and</strong>s were used. In particular, the following<br />
mapping was used to group the 36 Mel b<strong>and</strong>s into 12 frequency b<strong>and</strong>s:<br />
01 t = zeros(1,36); (2.21)<br />
02 t(1) = 1; t( 7: 8) = 5; t(15:18) = 9;<br />
03 t(2) = 2; t( 9:10) = 6; t(19:23) = 10;<br />
04 t(3:4) = 3; t(11:12) = 7; t(24:29) = 11;<br />
05 t(5:6) = 4; t(13:14) = 8; t(30:36) = 12;<br />
06<br />
07 mel2 = zeros(12,size(M_dB,2));<br />
08 for i=1:12,<br />
09 mel2(i,:) = sum(M_dB(t==i,:),1);<br />
10 end<br />
The actual values are more or less arbitrary. However, they are based on<br />
the observation that interesting details are <strong>of</strong>ten in lower frequency b<strong>and</strong>s.<br />
Note that the energy is added up. Thus, the 12th frequency b<strong>and</strong> <strong>of</strong> mel2<br />
represents the sum <strong>of</strong> 7 M_db b<strong>and</strong>s while the first <strong>and</strong> second b<strong>and</strong> only<br />
represent one. This can be seen in Figure 2.16. In particular, the frequency<br />
b<strong>and</strong>s 9-11 have higher values.<br />
The following defines the constants used for computations, in particular<br />
the fluctuation strength weights (flux), the filter which smoothes over the<br />
frequency b<strong>and</strong>s (filt1), <strong>and</strong> the filter which smoothes over the modulation