Computational Models of Music Similarity and their ... - OFAI

More documents

Recommendations

Info

28 2 Audio-based Similarity Measures on the expectations. The expectation step is: P (m|x n , Θ) = p(x n|m, Θ)P m p(x n ) = N (x n |µ m , Σ m )P m ∑ M m ′ =1 N (x . (2.14) n|µ m ′, Σ m ′)P m ′ The maximization step is: ∑ µ ∗ m = ∑ n P (m|x n, Θ)x n n P (m|x ′ n ′, Θ) ∑n Σ ∗ m = P (m|x n, Θ)(x n − µ m )(x n − µ m ) ∑ T n P (m|x ′ n ′, Θ) Pm ∗ = 1 ∑ P (m|x n , Θ). N n (2.15) Cluster Model Similarity To compute the similarity of pieces A and B a sample from each GMM is drawn, X A and X B respectively. (It is not feasible to store and use the original MFCC frames instead, due to memory constraints when dealing with large collections.) In the remainder of this section a sample size of 2000 is used. The log-likelihood L(X|Θ) that a sample X was generated by the model Θ is computed for each piece/sample combination (Equation 2.13). Aucouturier and Pachet [AP02a] suggest computing the distance as: d AB = L(X A |Θ A ) + L(X B |Θ B ) − L(X A |Θ B ) − L(X B |Θ A ). (2.16) Note that L(X A |Θ B ) and L(X B |Θ A ) are generally different values. However, a symmetric similarity measure (d AB = d BA ) is very desirable for most applications. Thus, both are used for d AB . The self-similarity is added to normalize the results. In most cases the following statements are true: L(X A |Θ A ) > L(X A |Θ B ) and L(X A |Θ A ) > L(X B |Θ A ). The distance is (slightly) different every time it is computed due to the randomness of the sampling. Furthermore, if some randomness is used to initialize the GMM, different models will be produced every time and will also lead to different distance values. ISMIR 2004 Genre Classification Contest An implementation of G30 using a nearest neighbor classifier with the following parameters won the ISMIR 2004 genre classification contest. Three minutes from the center of each piece (22kHz, mono) were used for analysis. The MFCCs were computed using 19 coefficients (the first is ignored). The
2.2 Techniques 29 FFT window size was 512 with 50% overlap (hop size 256). To initialize the GMM k-means was used. The number of samples used to compute the distance was 2000. The necessary computation time exceeded by far the time constraints of the ISMIR 2005 (MIREX) competition. The implementation is available in the MA toolbox for Matlab [Pam04]. Illustrations Figure 2.10 shows some characteristics of G30. One of the observations is that the first and last row are very similar. That is, the original frames, and the 2000 frames sampled from the GMM generate a very similar histogram. Thus, the GMM appears to be suited to represent the distribution of the data. A further observation is that only a few of the original frames and sampled frames have a high probability. The majority has a very low probability. This can be seen in rows 4 and 5. Note that both rows would look quite different if a new GMM is trained (or a new sample is drawn from the same GMM in the case of row 5). Row 2 shows that most centers have a rather similar shape. Row 3 shows that the variances for some pieces are larger than for others. For example, Someday has less variance than Kathy’s Waltz. Also noticeable is the typical shape of a spectrum. In higher frequency bands there is only little energy and in the lower frequency bands there is usually more variance. 2.2.3.3 Thirty Gaussians Simplified (G30S) G30S [Pam05] is basically a computational optimization of G30 and is based on merging ideas from [LS01] and [AP02a]. As suggested in [LS01] k-means is used to cluster the MFCC frames instead of GMM-EM. In addition, two clusters are automatically merged if they are very similar. In particular, first k-means is used to find 30 clusters. If the distance between two of these is below a (manually) defined threshold they are merged and k-means is used to find 29 clusters. This is repeated until all clusters have a least a minimum distance to each other. Empty clusters (i.e. clusters which do not represent any frames) are deleted. The maximum number of clusters per piece is 30 and the minimum is 1. The threshold is set so that most pieces have 30 clusters and only very few have less than 20. In practice it does not occur that a piece has only 1 cluster (unless it is mostly silent). This optimization can be very useful since the distance computation time depends quadratically on the number of clusters.
Page 1: DISSERTATION Computational Models o
Page 5: Abstract This thesis aims at develo
Page 8 and 9: evaluate similarity measures for dr
Page 10 and 11: 2.2.7.3 Always Dissimilar . . . . .
Page 13 and 14: Chapter 1 Introduction This chapter
Page 15 and 16: 1.1 Outline of this Thesis 3 measur
Page 17 and 18: 1.2 Matlab Syntax 5 ◦ Development
Page 19 and 20: 1.2 Matlab Syntax 7 A frequently us
Page 21 and 22: Chapter 2 Audio-based Similarity Me
Page 23 and 24: 2.1 Introduction 11 Experts High qu
Page 25 and 26: 2.2 Techniques 13 2.2 Techniques To
Page 27 and 28: 2.2 Techniques 15 Amplitude 0 ZCR:
Page 29 and 30: 2.2 Techniques 17 MFCCs Mel Frequen
Page 31 and 32: 2.2 Techniques 19 Segment wav(idx)
Page 33 and 34: 2.2 Techniques 21 Triangular Filter
Page 35 and 36: 2.2 Techniques 23 num_coeffs = 5 nu
Page 37 and 38: 2.2 Techniques 25 2.2.2.5 Parameter
Page 39: 2.2 Techniques 27 used for clusteri
Page 43 and 44: 2.2 Techniques 31 Unlike G30 no ran
Page 45 and 46: 2.2 Techniques 33 2.2.3.4 Single Ga
Page 47 and 48: 2.2 Techniques 35 Blue Rondo ... Ka
Page 49 and 50: 2.2 Techniques 37 G30 G30S G1 G1 re
Page 51 and 52: 2.2 Techniques 39 Relative Fluctuat
Page 53 and 54: 2.2 Techniques 41 36 mel 71 1 12 me
Page 55 and 56: 2.2 Techniques 43 2.2.5.1 Time Doma
Page 57 and 58: 2.2 Techniques 45 Alternatively, th
Page 59 and 60: 2.2 Techniques 47 ZCR (×10 −3 )
Page 61 and 62: 2.2 Techniques 49 2.2.6 Linear Comb
Page 63 and 64: 2.3 Optimization and Evaluation 51
Page 91 and 92:
2.3 Optimization and Evaluation 79
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
2.5 Alternative: Web-based Similari
Page 103 and 104:
2.6 Conclusions 91 2.5.3 Limitation
Page 105 and 106:
Chapter 3 Applications This chapter
Page 107 and 108:
3.2 Islands of Music 95 Figure 3.1:
Page 109 and 110:
3.2 Islands of Music 97 they use to
Page 111 and 112:
3.2 Islands of Music 99 a b c d Fig
Page 113 and 114:
3.2 Islands of Music 101 AMBIENT CL
Page 115 and 116:
3.2 Islands of Music 103 Figure 3.6
Page 117 and 118:
3.2 Islands of Music 105 scribing a
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
3.3 Fuzzy Hierarchical Organization
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
3.4 Dynamic Playlist Generation 125
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
3.5 Conclusions 137 + Punk / Bad Re
Page 151 and 152:
Chapter 4 Conclusions In this thesi
Page 153 and 154:
Bibliography [AHH + 03] Eric Allama
Page 155 and 156:
[CKGB02] Pedro Cano, Martin Kaltenb
Page 157 and 158:
[Got03] Masataka Goto, A Chorus-Sec
Page 159 and 160:
[Lüb05] Dominik Lübbers, SoniXplo
Page 161 and 162:
[PFW05b] , Improvements of Audio-Ba
Page 163 and 164:
[SKW05a] Markus Schedl, Peter Knees
Page 165:
Elias Pampalk I was born in 1978 in
show all

Computational Models of Music Similarity and their ... - OFAI

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?