03.05.2014 Views

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 2 Audio-based <strong>Similarity</strong> Measures<br />

resentation this might not be trivial. For example, some <strong>of</strong> the similarity<br />

measures described in this section (which use features with a frame-level<br />

scope) use Monte Carlo sampling or the Kullback-Leibler divergence to compare<br />

pieces.<br />

<strong>Computational</strong> Limits<br />

In general it is not possible to model every nerve cell in the human auditory<br />

system when processing music archives with terabytes <strong>of</strong> data. Again the<br />

intended application defines the requirements. A similarity measure that<br />

runs on a mobile device will have other constraints than one which can be run<br />

in parallel on a massive server farm. Furthermore, it makes a big difference<br />

if the similarities are computed for a collection <strong>of</strong> a few hundred pieces, or<br />

for a catalog <strong>of</strong> a few million pieces. Finding the optimal trade-<strong>of</strong>f between<br />

required resources (including memory <strong>and</strong> processing time) <strong>and</strong> quality might<br />

not be trivial.<br />

Structure <strong>of</strong> this Section<br />

This section is structured as follows. The next subsection gives a simple<br />

introduction to similarity computations using the Zero Crossing Rate as an<br />

example. Subsection 2.2.2 describes how the time domain representation <strong>of</strong><br />

the audio signals is transformed to the frequency domain. Subsections 2.2.3–<br />

2.2.5 describe different features <strong>and</strong> how they are used to compute similarity.<br />

The main focus is on spectral similarity (which is somehow related to timbre)<br />

<strong>and</strong> Fluctuation Patterns (which are somehow related to rhythmical properties).<br />

Subsection 2.2.6 describes how the different approaches are combined<br />

linearly. Subsection 2.2.7 describes anomalies in the similarity space. In particular,<br />

the triangular inequality does not always hold, <strong>and</strong> a few pieces are<br />

estimated to be highly similar to a very large number <strong>of</strong> pieces while others<br />

are highly dissimilar to almost all other pieces.<br />

2.2.1 The Basic Idea (ZCR Illustration)<br />

This subsection illustrates the concept <strong>of</strong> audio-based music similarity using<br />

the Zero Crossing Rate as example. The ZCR is very simple to compute <strong>and</strong><br />

has been applied to speech processing to distinguish voiced sections from<br />

noise. Furthermore, it has been applied to MIR tasks such as classifying<br />

percussive sounds, or genres. For example, the winning entry <strong>of</strong> the MIREX<br />

2005 genre classification contest used the ZCR among other features. 7<br />

7 The MIREX contest will be discussed in more detail in Subsection 2.3.1.1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!