Computational Models of Music Similarity and their ... - OFAI

More documents

Recommendations

Info

26 2 Audio-based Similarity Measures a classifier and is the main weak-point. To be able to describe a broad range of music a huge amount of clusters is necessary. Furthermore, there is no guarantee that music not used to train the clusters can be described meaningfully. The first localized approach was presented by Logan and Salomon [LS01]. For each piece an individual set of clusters is used. The distances between these are computed using the Kullback-Leibler Divergence combined with the Earth Movers Distance [RTG00]. Aucouturier and Pachet suggested using the computationally more expensive Monte Carlo sampling instead [AP02a; AP04a]. A simplified approach approach using a fast approximation of the Monte Carlo sampling was presented in [Pam05]. Mandel and Ellis [ME05] propose an even simpler approach using only one cluster per piece and comparing them using the Kullback-Leibler Divergence. All three are described in detail in this subsection. Alternative techniques to compute spectral similarity include, for example, the anchor space similarity [BEL03], the spectrum histograms [PDW03a], or simply using the mean and standard deviations of the MFCCs (e.g. [TC02]). 2.2.3.2 Thirty Gaussians and Monte Carlo Sampling (G30) This approach was originally presented in [AP02a]. Extensive evaluation results were reported in [AP04a]. A Matlab implementation based on these won the ISMIR 2004 genre classification contest [Pam04]. 12 The approach consists of two steps. These are clustering the frames and computing the cluster model similarity. First, the various spectra (frames represented by MFCCs) which occur in the piece are summarized (i.e., the distribution is modeled) by using a clustering algorithm to find typical spectra (cluster centers) and describing how typical they are (prior probabilities), and how the other spectra vary with respect to these few typical spectra (variances). Second, to compute the distance between two pieces, the distribution of their spectra are compared. If two pieces have similar distributions (i.e., if their spectra can be described using similar typical spectra, with similar variances, and priors) they are assumed to be similar. Frame Clustering The frames are clustered using a Gaussian Mixture Model (GMM) and Expectation Maximization (see e.g. [Bis95]). GMMs are a standard technique 12 http://ismir2004.ismir.net/genre contest/index.htm
2.2 Techniques 27 used for clustering with soft assignments, or modeling probability density distributions. A reference implementation can be found, e.g., in the Netlab toolbox [Nab01] for Matlab. A multivariate Gaussian probability density function is defined as: N (x|µ, Σ) = ( ) p/2 ( 1 |Σ| −1/2 exp − 1 ) 2π 2 (x − µ)T Σ −1 (x − µ) , (2.11) where x is the observation (19-dimensional MFCC frame), µ is the mean (19- dimensional vector describing a typical spectrum), Σ is a 19×19 covariance matrix (for G30 only a diagonal covariance is used, i.e., only values on the diagonal are non zero). A GMM is a mixture of M Gaussians where the contribution of the m-th component is weighted by a prior P m with P m ≥ 0 and ∑ P m = 1: p(x|Θ) = M∑ P m N (x|µ m , Σ m ), (2.12) m=1 where Θ are the parameters that need to be estimated per GMM (i.e., per piece of music): Θ = {µ m , Σ m , P m |m = 1..M}. Examples of what such a model looks like for a piece of music are shown in the last paragraph of this subsection. The optimal estimate for Θ maximizes the likelihood that the frames X = {x 1 , ..., x N } were generated by the GMM, where N is the number of frames (which is about 5200 for the parameters used in the preprocessing). The standard measure used is the log-likelihood which is computed as: L(X|Θ) = log ∏ n p(x n |Θ) = ∑ n log p(x n |Θ). (2.13) To find good estimates for Θ a standard approach is to use the Expectation Maximization (EM) algorithm. The EM algorithm is iterative (based on an old estimate a better estimate is computed) and converges relatively fast after few iterations. The initial estimates can be completely random, or can be computed by using other clustering algorithms such as k-means. (Alternatively, as suggested in [LS01] k-means can be used alone to cluster the frames.) The EM algorithm consists of two steps. First, the expectation is computed. That is, the probability (expectation) that an observation x n was generated by the m-th component. Second, the expectation (and thus the likelihood) is maximized. That is, the parameters in Θ are recomputed based
Page 1: DISSERTATION Computational Models o
Page 5: Abstract This thesis aims at develo
Page 8 and 9: evaluate similarity measures for dr
Page 10 and 11: 2.2.7.3 Always Dissimilar . . . . .
Page 13 and 14: Chapter 1 Introduction This chapter
Page 15 and 16: 1.1 Outline of this Thesis 3 measur
Page 17 and 18: 1.2 Matlab Syntax 5 ◦ Development
Page 19 and 20: 1.2 Matlab Syntax 7 A frequently us
Page 21 and 22: Chapter 2 Audio-based Similarity Me
Page 23 and 24: 2.1 Introduction 11 Experts High qu
Page 25 and 26: 2.2 Techniques 13 2.2 Techniques To
Page 27 and 28: 2.2 Techniques 15 Amplitude 0 ZCR:
Page 29 and 30: 2.2 Techniques 17 MFCCs Mel Frequen
Page 31 and 32: 2.2 Techniques 19 Segment wav(idx)
Page 33 and 34: 2.2 Techniques 21 Triangular Filter
Page 35 and 36: 2.2 Techniques 23 num_coeffs = 5 nu
Page 37: 2.2 Techniques 25 2.2.2.5 Parameter
Page 41 and 42: 2.2 Techniques 29 FFT window size w
Page 43 and 44: 2.2 Techniques 31 Unlike G30 no ran
Page 45 and 46: 2.2 Techniques 33 2.2.3.4 Single Ga
Page 47 and 48: 2.2 Techniques 35 Blue Rondo ... Ka
Page 49 and 50: 2.2 Techniques 37 G30 G30S G1 G1 re
Page 51 and 52: 2.2 Techniques 39 Relative Fluctuat
Page 53 and 54: 2.2 Techniques 41 36 mel 71 1 12 me
Page 55 and 56: 2.2 Techniques 43 2.2.5.1 Time Doma
Page 57 and 58: 2.2 Techniques 45 Alternatively, th
Page 59 and 60: 2.2 Techniques 47 ZCR (×10 −3 )
Page 61 and 62: 2.2 Techniques 49 2.2.6 Linear Comb
Page 63 and 64: 2.3 Optimization and Evaluation 51
Page 89 and 90:
2.3 Optimization and Evaluation 77
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
2.5 Alternative: Web-based Similari
Page 103 and 104:
2.6 Conclusions 91 2.5.3 Limitation
Page 105 and 106:
Chapter 3 Applications This chapter
Page 107 and 108:
3.2 Islands of Music 95 Figure 3.1:
Page 109 and 110:
3.2 Islands of Music 97 they use to
Page 111 and 112:
3.2 Islands of Music 99 a b c d Fig
Page 113 and 114:
3.2 Islands of Music 101 AMBIENT CL
Page 115 and 116:
3.2 Islands of Music 103 Figure 3.6
Page 117 and 118:
3.2 Islands of Music 105 scribing a
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
3.3 Fuzzy Hierarchical Organization
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
3.4 Dynamic Playlist Generation 125
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
3.5 Conclusions 137 + Punk / Bad Re
Page 151 and 152:
Chapter 4 Conclusions In this thesi
Page 153 and 154:
Bibliography [AHH + 03] Eric Allama
Page 155 and 156:
[CKGB02] Pedro Cano, Martin Kaltenb
Page 157 and 158:
[Got03] Masataka Goto, A Chorus-Sec
Page 159 and 160:
[Lüb05] Dominik Lübbers, SoniXplo
Page 161 and 162:
[PFW05b] , Improvements of Audio-Ba
Page 163 and 164:
[SKW05a] Markus Schedl, Peter Knees
Page 165:
Elias Pampalk I was born in 1978 in
show all

Computational Models of Music Similarity and their ... - OFAI

Create successful ePaper yourself

Delete template?

Save as template?