Computational Models of Music Similarity and their ... - OFAI

More documents

Recommendations

Info

22 2 Audio-based Similarity Measures 1 DCT Matrix Mel Frequency Cepstral Coefficients Reconstructed 80 mfcc M_dB_rec 60 40 20 1 36 0 1 20 20 0 1 10 20 30 36 Figure 2.7: DCT matrix, MFCCs, and reconstructed Mel power spectrum (dB) for the audio signal used in Figure 2.3. High values in the DCT matrix are visualized as black. The dimensions of the DCT matrix are DCT coefficients on the y-axis and Mel frequency bands on the x-axis. 2.2.2.4 DCT The Discrete Cosine Transform is applied to compress the Mel power spectrum. In particular, the num_filt (e.g. 36) frequency bands are represented by num_coeffs (e.g. 20) coefficients. Alternatives include, e.g., the Principial Component Analysis. A side effect of the compression is that the spectrum is smoothed along the frequency axis which can be interpreted as a simple approximation of the spectral masking in the human auditory system. The DCT matrix can be computed as follows. 11 01 num_coeffs = 20; (2.8) 02 03 DCT = 1/sqrt(num_filt/2) * ... 04 cos((0:num_coeffs-1)’*(0.5:num_filt)*pi/num_filt); 05 DCT(1,:) = DCT(1,:)*sqrt(2)/2; Thus the DCT matrix has num_coeffs rows and num_filt columns. Each of the rows corresponds to an eigenvector, starting with the most important one (highest eigenvalue) in the first row. The first eigenvector describes the mean of the spectrum. The second describes a spectral pattern with high energy in the lower half of the frequencies and low energy in the upper half. The eigenvectors are orthogonal. The DCT is applied to the Mel power spectrum (in Decibel) as follows: mfcc = DCT * M_dB; (2.9) The effects of the DCT matrix on the Mel power spectrum are shown in Figure 2.7. The resulting MFCCs are a compressed representation of the 11 This code is based on Malcolm Slaney’s Auditory Toolbox.
2.2 Techniques 23 num_coeffs = 5 num_coeffs = 10 num_coeffs = 15 num_coeffs = 20 num_coeffs = 36 Figure 2.8: DCT’*DCT, each plot is a 36 by 36 matrix. The dimension of both axes are Mel frequency bands. original data. In particular, while the original audio signal has 512 samples per 23ms segment (22050Hz, mono) the MFCC representation only requires 20 values for 12ms (using 50% overlap for the power spectrum). Depending on the application the number of coefficients (line 1 in Algorithm 2.8) can range from 8 to 40. However, the number of coefficients is always lower than the number of Mel frequency bands used. The number of Mel frequency bands can be adjusted in Algorithm 2.4 line 1. To understand the smoothing effects of the DCT it is useful to look at the reconstructed Mel power spectrum (Figure 2.7) and compare it to the original Mel power spectrum (Figure 2.6). The reconstructed spectrum is computed as: M_dB_rec = DCT’ * mfcc; (2.10) In addition, to further understand the smoothing it is useful to illustrate DCT’*DCT for different values of num_coeffs (see Figure 2.8). If the number of coefficients equals the number of Mel filters then DCT’*DCT is the identity matrix (thus no smoothing occurs). For lower values, the smoothing between neighboring frequency bands is clearly visible. However, it is also important to realize that the smoothing effects are not limited to neighboring frequency bands. To conclude the computation of the MFCC coefficients Figure 2.9 illustrates the computation steps on the first 10 second sequence shown in Figure 2.2. Noticeable are (1) the changes in frequency resolution when transforming the power spectrum to the Mel power spectrum, (2) that MFCCs when viewed directly are difficult to interpret and that most of the variations occur in the lower coefficients, (3) the effects of the DCT-based smoothing (when comparing the Mel power spectrum with the reconstructed version).
Page 1: DISSERTATION Computational Models o
Page 5: Abstract This thesis aims at develo
Page 8 and 9: evaluate similarity measures for dr
Page 10 and 11: 2.2.7.3 Always Dissimilar . . . . .
Page 13 and 14: Chapter 1 Introduction This chapter
Page 15 and 16: 1.1 Outline of this Thesis 3 measur
Page 17 and 18: 1.2 Matlab Syntax 5 ◦ Development
Page 19 and 20: 1.2 Matlab Syntax 7 A frequently us
Page 21 and 22: Chapter 2 Audio-based Similarity Me
Page 23 and 24: 2.1 Introduction 11 Experts High qu
Page 25 and 26: 2.2 Techniques 13 2.2 Techniques To
Page 27 and 28: 2.2 Techniques 15 Amplitude 0 ZCR:
Page 29 and 30: 2.2 Techniques 17 MFCCs Mel Frequen
Page 31 and 32: 2.2 Techniques 19 Segment wav(idx)
Page 33: 2.2 Techniques 21 Triangular Filter
Page 37 and 38: 2.2 Techniques 25 2.2.2.5 Parameter
Page 39 and 40: 2.2 Techniques 27 used for clusteri
Page 41 and 42: 2.2 Techniques 29 FFT window size w
Page 43 and 44: 2.2 Techniques 31 Unlike G30 no ran
Page 45 and 46: 2.2 Techniques 33 2.2.3.4 Single Ga
Page 47 and 48: 2.2 Techniques 35 Blue Rondo ... Ka
Page 49 and 50: 2.2 Techniques 37 G30 G30S G1 G1 re
Page 51 and 52: 2.2 Techniques 39 Relative Fluctuat
Page 53 and 54: 2.2 Techniques 41 36 mel 71 1 12 me
Page 55 and 56: 2.2 Techniques 43 2.2.5.1 Time Doma
Page 57 and 58: 2.2 Techniques 45 Alternatively, th
Page 59 and 60: 2.2 Techniques 47 ZCR (×10 −3 )
Page 61 and 62: 2.2 Techniques 49 2.2.6 Linear Comb
Page 63 and 64: 2.3 Optimization and Evaluation 51
Page 85 and 86:
2.3 Optimization and Evaluation 73
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
2.5 Alternative: Web-based Similari
Page 103 and 104:
2.6 Conclusions 91 2.5.3 Limitation
Page 105 and 106:
Chapter 3 Applications This chapter
Page 107 and 108:
3.2 Islands of Music 95 Figure 3.1:
Page 109 and 110:
3.2 Islands of Music 97 they use to
Page 111 and 112:
3.2 Islands of Music 99 a b c d Fig
Page 113 and 114:
3.2 Islands of Music 101 AMBIENT CL
Page 115 and 116:
3.2 Islands of Music 103 Figure 3.6
Page 117 and 118:
3.2 Islands of Music 105 scribing a
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
3.3 Fuzzy Hierarchical Organization
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
3.4 Dynamic Playlist Generation 125
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
3.5 Conclusions 137 + Punk / Bad Re
Page 151 and 152:
Chapter 4 Conclusions In this thesi
Page 153 and 154:
Bibliography [AHH + 03] Eric Allama
Page 155 and 156:
[CKGB02] Pedro Cano, Martin Kaltenb
Page 157 and 158:
[Got03] Masataka Goto, A Chorus-Sec
Page 159 and 160:
[Lüb05] Dominik Lübbers, SoniXplo
Page 161 and 162:
[PFW05b] , Improvements of Audio-Ba
Page 163 and 164:
[SKW05a] Markus Schedl, Peter Knees
Page 165:
Elias Pampalk I was born in 1978 in
show all

Computational Models of Music Similarity and their ... - OFAI

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?