Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
36 2 Audio-based <strong>Similarity</strong> Measures<br />
G30 G30S G1<br />
FC 25000 700 30.0<br />
CMS 400 2 0.1<br />
Table 2.1: Approximate CPU times in milliseconds on a Intel Pentium M 2GHz<br />
(755) for frame clustering (per piece) <strong>and</strong> cluster model similarity (per pair <strong>of</strong><br />
pieces). The approximate time for loading a 120 second (22kHz, mono) audio<br />
file in WAV format into Matlab is 0.25 seconds. The necessary time to compute<br />
MFCCs is about 1.5 seconds using no overlap between frames <strong>and</strong> a segment size <strong>of</strong><br />
512 (23ms). The number <strong>of</strong> frames is about 5200 frames (for 2 minutes <strong>of</strong> audio).<br />
the system needs to compute all distances <strong>of</strong> interest very fast to minimize<br />
the system’s response time. Note that the FC time for G30 can easily be<br />
reduced to the time <strong>of</strong> G30S (<strong>and</strong> is mainly a question <strong>of</strong> accuracy). However,<br />
there is no way to reduce the computation times <strong>of</strong> G30 or G30S to those <strong>of</strong><br />
G1. G1 is clearly magnitudes faster.<br />
2.2.3.6 Distance Matrices<br />
Figure 2.14 shows the distance matrices for the 6 songs using the three spectral<br />
similarity measures described in this section. The matrices computed<br />
for G30 <strong>and</strong> G30s are very similar. Furthermore, the difference between<br />
the original <strong>and</strong> the rescaled distance matrix for G1 are clearly noticeable.<br />
Rescaling G1 is very important when combining the distance matrix with<br />
additional information as discussed in the subsequent sections. Furthermore,<br />
a balanced distance matrix is also important for techniques which visualize<br />
whole collections such as the Isl<strong>and</strong>s <strong>of</strong> <strong>Music</strong> discussed in the next chapter.<br />
However, if only a ranked list <strong>of</strong> similar pieces is required then the scaling is<br />
not critical.<br />
Compared to the ZCR results it seems that one problem has been solved.<br />
That is, the piece <strong>of</strong> classical music is now differentiated from the other pieces.<br />
However, the hard pop <strong>and</strong> electronic dance pieces are not distinguishable.<br />
One solution is to add information related to the beats <strong>and</strong> rhythm which is<br />
the topic <strong>of</strong> the next section.<br />
2.2.4 Fluctuation Patterns<br />
Fluctuation Patterns (FPs) describe the amplitude modulation <strong>of</strong> the loudness<br />
per frequency b<strong>and</strong> [Pam01; PRM02a] <strong>and</strong> are based on ideas developed<br />
in [Frü01; FR01]. They describe characteristics <strong>of</strong> the audio signal which are<br />
not described by the spectral similarity measure.