Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
Computational Models of Music Similarity and their ... - OFAI
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
28 2 Audio-based <strong>Similarity</strong> Measures<br />
on the expectations. The expectation step is:<br />
P (m|x n , Θ) = p(x n|m, Θ)P m<br />
p(x n )<br />
=<br />
N (x n |µ m , Σ m )P m<br />
∑ M<br />
m ′ =1 N (x . (2.14)<br />
n|µ m ′, Σ m ′)P m ′<br />
The maximization step is:<br />
∑<br />
µ ∗ m = ∑ n P (m|x n, Θ)x n<br />
n<br />
P (m|x ′ n ′, Θ)<br />
∑n<br />
Σ ∗ m =<br />
P (m|x n, Θ)(x n − µ m )(x n − µ m )<br />
∑<br />
T<br />
n<br />
P (m|x ′ n ′, Θ)<br />
Pm ∗ = 1 ∑<br />
P (m|x n , Θ).<br />
N<br />
n<br />
(2.15)<br />
Cluster Model <strong>Similarity</strong><br />
To compute the similarity <strong>of</strong> pieces A <strong>and</strong> B a sample from each GMM<br />
is drawn, X A <strong>and</strong> X B respectively. (It is not feasible to store <strong>and</strong> use the<br />
original MFCC frames instead, due to memory constraints when dealing with<br />
large collections.) In the remainder <strong>of</strong> this section a sample size <strong>of</strong> 2000 is<br />
used. The log-likelihood L(X|Θ) that a sample X was generated by the<br />
model Θ is computed for each piece/sample combination (Equation 2.13).<br />
Aucouturier <strong>and</strong> Pachet [AP02a] suggest computing the distance as:<br />
d AB = L(X A |Θ A ) + L(X B |Θ B ) − L(X A |Θ B ) − L(X B |Θ A ). (2.16)<br />
Note that L(X A |Θ B ) <strong>and</strong> L(X B |Θ A ) are generally different values. However,<br />
a symmetric similarity measure (d AB = d BA ) is very desirable for most<br />
applications. Thus, both are used for d AB . The self-similarity is added<br />
to normalize the results. In most cases the following statements are true:<br />
L(X A |Θ A ) > L(X A |Θ B ) <strong>and</strong> L(X A |Θ A ) > L(X B |Θ A ).<br />
The distance is (slightly) different every time it is computed due to the<br />
r<strong>and</strong>omness <strong>of</strong> the sampling. Furthermore, if some r<strong>and</strong>omness is used to<br />
initialize the GMM, different models will be produced every time <strong>and</strong> will<br />
also lead to different distance values.<br />
ISMIR 2004 Genre Classification Contest<br />
An implementation <strong>of</strong> G30 using a nearest neighbor classifier with the following<br />
parameters won the ISMIR 2004 genre classification contest. Three<br />
minutes from the center <strong>of</strong> each piece (22kHz, mono) were used for analysis.<br />
The MFCCs were computed using 19 coefficients (the first is ignored). The