03.05.2014 Views

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

36 2 Audio-based <strong>Similarity</strong> Measures<br />

G30 G30S G1<br />

FC 25000 700 30.0<br />

CMS 400 2 0.1<br />

Table 2.1: Approximate CPU times in milliseconds on a Intel Pentium M 2GHz<br />

(755) for frame clustering (per piece) <strong>and</strong> cluster model similarity (per pair <strong>of</strong><br />

pieces). The approximate time for loading a 120 second (22kHz, mono) audio<br />

file in WAV format into Matlab is 0.25 seconds. The necessary time to compute<br />

MFCCs is about 1.5 seconds using no overlap between frames <strong>and</strong> a segment size <strong>of</strong><br />

512 (23ms). The number <strong>of</strong> frames is about 5200 frames (for 2 minutes <strong>of</strong> audio).<br />

the system needs to compute all distances <strong>of</strong> interest very fast to minimize<br />

the system’s response time. Note that the FC time for G30 can easily be<br />

reduced to the time <strong>of</strong> G30S (<strong>and</strong> is mainly a question <strong>of</strong> accuracy). However,<br />

there is no way to reduce the computation times <strong>of</strong> G30 or G30S to those <strong>of</strong><br />

G1. G1 is clearly magnitudes faster.<br />

2.2.3.6 Distance Matrices<br />

Figure 2.14 shows the distance matrices for the 6 songs using the three spectral<br />

similarity measures described in this section. The matrices computed<br />

for G30 <strong>and</strong> G30s are very similar. Furthermore, the difference between<br />

the original <strong>and</strong> the rescaled distance matrix for G1 are clearly noticeable.<br />

Rescaling G1 is very important when combining the distance matrix with<br />

additional information as discussed in the subsequent sections. Furthermore,<br />

a balanced distance matrix is also important for techniques which visualize<br />

whole collections such as the Isl<strong>and</strong>s <strong>of</strong> <strong>Music</strong> discussed in the next chapter.<br />

However, if only a ranked list <strong>of</strong> similar pieces is required then the scaling is<br />

not critical.<br />

Compared to the ZCR results it seems that one problem has been solved.<br />

That is, the piece <strong>of</strong> classical music is now differentiated from the other pieces.<br />

However, the hard pop <strong>and</strong> electronic dance pieces are not distinguishable.<br />

One solution is to add information related to the beats <strong>and</strong> rhythm which is<br />

the topic <strong>of</strong> the next section.<br />

2.2.4 Fluctuation Patterns<br />

Fluctuation Patterns (FPs) describe the amplitude modulation <strong>of</strong> the loudness<br />

per frequency b<strong>and</strong> [Pam01; PRM02a] <strong>and</strong> are based on ideas developed<br />

in [Frü01; FR01]. They describe characteristics <strong>of</strong> the audio signal which are<br />

not described by the spectral similarity measure.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!