Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
14 2 Audio-based <strong>Similarity</strong> Measures<br />
resentation this might not be trivial. For example, some <strong>of</strong> the similarity<br />
measures described in this section (which use features with a frame-level<br />
scope) use Monte Carlo sampling or the Kullback-Leibler divergence to compare<br />
pieces.<br />
<strong>Computational</strong> Limits<br />
In general it is not possible to model every nerve cell in the human auditory<br />
system when processing music archives with terabytes <strong>of</strong> data. Again the<br />
intended application defines the requirements. A similarity measure that<br />
runs on a mobile device will have other constraints than one which can be run<br />
in parallel on a massive server farm. Furthermore, it makes a big difference<br />
if the similarities are computed for a collection <strong>of</strong> a few hundred pieces, or<br />
for a catalog <strong>of</strong> a few million pieces. Finding the optimal trade-<strong>of</strong>f between<br />
required resources (including memory <strong>and</strong> processing time) <strong>and</strong> quality might<br />
not be trivial.<br />
Structure <strong>of</strong> this Section<br />
This section is structured as follows. The next subsection gives a simple<br />
introduction to similarity computations using the Zero Crossing Rate as an<br />
example. Subsection 2.2.2 describes how the time domain representation <strong>of</strong><br />
the audio signals is transformed to the frequency domain. Subsections 2.2.3–<br />
2.2.5 describe different features <strong>and</strong> how they are used to compute similarity.<br />
The main focus is on spectral similarity (which is somehow related to timbre)<br />
<strong>and</strong> Fluctuation Patterns (which are somehow related to rhythmical properties).<br />
Subsection 2.2.6 describes how the different approaches are combined<br />
linearly. Subsection 2.2.7 describes anomalies in the similarity space. In particular,<br />
the triangular inequality does not always hold, <strong>and</strong> a few pieces are<br />
estimated to be highly similar to a very large number <strong>of</strong> pieces while others<br />
are highly dissimilar to almost all other pieces.<br />
2.2.1 The Basic Idea (ZCR Illustration)<br />
This subsection illustrates the concept <strong>of</strong> audio-based music similarity using<br />
the Zero Crossing Rate as example. The ZCR is very simple to compute <strong>and</strong><br />
has been applied to speech processing to distinguish voiced sections from<br />
noise. Furthermore, it has been applied to MIR tasks such as classifying<br />
percussive sounds, or genres. For example, the winning entry <strong>of</strong> the MIREX<br />
2005 genre classification contest used the ZCR among other features. 7<br />
7 The MIREX contest will be discussed in more detail in Subsection 2.3.1.1