03.05.2014 Views

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

Computational Models of Music Similarity and their ... - OFAI

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

28 2 Audio-based <strong>Similarity</strong> Measures<br />

on the expectations. The expectation step is:<br />

P (m|x n , Θ) = p(x n|m, Θ)P m<br />

p(x n )<br />

=<br />

N (x n |µ m , Σ m )P m<br />

∑ M<br />

m ′ =1 N (x . (2.14)<br />

n|µ m ′, Σ m ′)P m ′<br />

The maximization step is:<br />

∑<br />

µ ∗ m = ∑ n P (m|x n, Θ)x n<br />

n<br />

P (m|x ′ n ′, Θ)<br />

∑n<br />

Σ ∗ m =<br />

P (m|x n, Θ)(x n − µ m )(x n − µ m )<br />

∑<br />

T<br />

n<br />

P (m|x ′ n ′, Θ)<br />

Pm ∗ = 1 ∑<br />

P (m|x n , Θ).<br />

N<br />

n<br />

(2.15)<br />

Cluster Model <strong>Similarity</strong><br />

To compute the similarity <strong>of</strong> pieces A <strong>and</strong> B a sample from each GMM<br />

is drawn, X A <strong>and</strong> X B respectively. (It is not feasible to store <strong>and</strong> use the<br />

original MFCC frames instead, due to memory constraints when dealing with<br />

large collections.) In the remainder <strong>of</strong> this section a sample size <strong>of</strong> 2000 is<br />

used. The log-likelihood L(X|Θ) that a sample X was generated by the<br />

model Θ is computed for each piece/sample combination (Equation 2.13).<br />

Aucouturier <strong>and</strong> Pachet [AP02a] suggest computing the distance as:<br />

d AB = L(X A |Θ A ) + L(X B |Θ B ) − L(X A |Θ B ) − L(X B |Θ A ). (2.16)<br />

Note that L(X A |Θ B ) <strong>and</strong> L(X B |Θ A ) are generally different values. However,<br />

a symmetric similarity measure (d AB = d BA ) is very desirable for most<br />

applications. Thus, both are used for d AB . The self-similarity is added<br />

to normalize the results. In most cases the following statements are true:<br />

L(X A |Θ A ) > L(X A |Θ B ) <strong>and</strong> L(X A |Θ A ) > L(X B |Θ A ).<br />

The distance is (slightly) different every time it is computed due to the<br />

r<strong>and</strong>omness <strong>of</strong> the sampling. Furthermore, if some r<strong>and</strong>omness is used to<br />

initialize the GMM, different models will be produced every time <strong>and</strong> will<br />

also lead to different distance values.<br />

ISMIR 2004 Genre Classification Contest<br />

An implementation <strong>of</strong> G30 using a nearest neighbor classifier with the following<br />

parameters won the ISMIR 2004 genre classification contest. Three<br />

minutes from the center <strong>of</strong> each piece (22kHz, mono) were used for analysis.<br />

The MFCCs were computed using 19 coefficients (the first is ignored). The

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!