A large set of audio features for sound description ... - WWW Ircam
A large set of audio features for sound description ... - WWW Ircam
A large set of audio features for sound description ... - WWW Ircam
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
1 Introduction<br />
1.1 Features taxonomy<br />
¦¤ ¤<br />
¢¡¤£¦¥§¥©¨£¢¡¤¡¤¡¤¨<br />
§¤¤¤¤<br />
©<br />
peeters@ircam.fr<br />
http://www.ircam.fr/<br />
¤¤ ¤¤©¤<br />
¦¤ ¢©©¤<br />
•<br />
•<br />
¤<br />
<br />
<br />
¦ <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
23/04/04 1/25<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
¤ <br />
! "
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
•<br />
•<br />
•<br />
•<br />
¦©¤<br />
¡<br />
©<br />
¤© ¤©¤<br />
¢¤£ ¦¥¡¨§© ¢ ¨£¡¤¨¦¡ ¡¤£¥¥¡¨¡¤¥¡¡ £ ¡ ¡ £ ¡ ¦£ £ ¥©¡¨¡¤¨¡¤¡ £ ¥¡<br />
• # $<br />
• #<br />
• % $<br />
• & $<br />
• ) $<br />
• * $<br />
•<br />
1.2 Organization <strong>of</strong> the paper<br />
<br />
<br />
<br />
<br />
<br />
23/04/04 2/25<br />
<br />
<br />
¦<br />
<br />
¦ <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
¦ <br />
<br />
¦<br />
<br />
¦ <br />
' ( ( (<br />
!<br />
<br />
<br />
¦<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
¦
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
2 Pre-computing<br />
•<br />
•<br />
•<br />
•<br />
2.1 Energy envelop<br />
$<br />
<br />
2.2 Short-Time Fourier Trans<strong>for</strong>m<br />
2.3 Sinusoidal Harmonic modeling<br />
23/04/04 3/25<br />
¢¡ £¥¤§¦©¨<br />
§©<br />
¥§¥<br />
¤ £ <br />
¤ ¨ <br />
¤ £ ¤§<br />
¢ ¥§©§ © ¥§© ©<br />
§©§©§©§<br />
<br />
§© © <br />
§ §¥ <br />
©¦©¨ ¨<br />
¥ ¦©¨ ©<br />
¡ © <br />
§© © © <br />
¥§©§ ©<br />
<br />
© <br />
<br />
§§©§©§©§<br />
<br />
© §<br />
© <br />
§©§©§¥§<br />
©¥<br />
© <br />
§© §<br />
§¥ © ¢<br />
©¥<br />
©
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
2.4 Perceptual model<br />
•<br />
•<br />
2.4.1 Mid-ear filtering<br />
2.4.2 Mel scale<br />
$<br />
• =<br />
<br />
• = ⋅ <br />
+<br />
<br />
<br />
<br />
<br />
<br />
<br />
£¢ ¤¦¥§©¨ ¡<br />
¨ ¨¨¨<br />
23/04/04 4/25<br />
¤¦¥§©¨<br />
Amplitude [db20]<br />
0<br />
-20<br />
-40<br />
-60<br />
-80<br />
-100<br />
10 -4<br />
-120<br />
2<br />
1.8<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
10 -2<br />
10 0<br />
10 2<br />
Frequency [Hz]<br />
10 4<br />
+ , - .<br />
Number <strong>of</strong> mel bands: 24<br />
¥ <br />
0 0.5 1 1.5 2 2.5<br />
x 10 4<br />
0<br />
Frequency [Hz]<br />
/ ,<br />
10 6
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
<br />
<br />
<br />
<br />
=<br />
2.4.3 Bark scale<br />
<br />
=<br />
⋅<br />
<br />
<br />
<br />
<br />
$<br />
<br />
<br />
=<br />
<br />
¡<br />
<br />
§<br />
§ <br />
<br />
<br />
<br />
+<br />
<br />
<br />
<br />
=<br />
<br />
=<br />
§<br />
§<br />
<br />
<br />
⋅<br />
<br />
¡<br />
<br />
<br />
<br />
<br />
<br />
¡ £¢¥¤¡§¦©¨ ¨ ¤¡¦<br />
0 0.5 1 1.5 2 2.5<br />
x 10 4<br />
0<br />
Frequency [Hz]<br />
23/04/04 5/25<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
Number <strong>of</strong> bark bands: 24<br />
0 1 2<br />
¡
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
2.5 Amplitude and Frequency scale<br />
2.5.1 Amplitude scales<br />
•<br />
•<br />
•<br />
2.5.2 Frequency scales<br />
•<br />
•<br />
0<br />
0<br />
x 10-3<br />
8<br />
2000 4000<br />
Freq<br />
6000<br />
23/04/04 6/25<br />
Ampl<br />
Log-ampl<br />
0.1<br />
0.05<br />
Power<br />
6<br />
4<br />
2<br />
0<br />
0 2000 4000 6000<br />
Freq<br />
200<br />
150<br />
100<br />
50<br />
0<br />
0 2000 4000 6000<br />
Freq<br />
Ampl<br />
Log-ampl<br />
0.1<br />
0.05<br />
Power<br />
0<br />
-10<br />
x 10-3<br />
8<br />
-5 0<br />
Log-freq<br />
5<br />
6<br />
4<br />
2<br />
0<br />
-10 -5 0 5<br />
Log-freq<br />
200<br />
150<br />
100<br />
3 &<br />
4 - 5 - . 6 - (<br />
4 - 5 - . 6 (<br />
4 - 5 - . 6 - (<br />
4 - 5 - . 6 - (<br />
4 - 5 - . 6 (<br />
4 - 5 5 - . 6 -<br />
2.6 Descriptors on Spectrum / Harmonic peaks / Bark bands<br />
50<br />
0<br />
-10 -5 0 5<br />
Log-freq
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
3 Global temporal <strong>features</strong><br />
3.1 Envelop characterization<br />
3.1.1 Attack / Decay / Sustain / Release envelop modeling<br />
7<br />
:<br />
attack decay sustain release<br />
7 % 8 $ & 9 8 ( "<br />
!<br />
sustained <strong>sound</strong><br />
non-sustained <strong>sound</strong><br />
attack rest<br />
: % 8 $ 9 8 ( "<br />
!<br />
23/04/04 7/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
3.1.2 Attack part<br />
•<br />
•<br />
3.1.2.1 Estimation <strong>of</strong> the start and end <strong>of</strong> the attack<br />
! $<br />
8 " 2<br />
$<br />
<br />
<br />
<br />
¢¡¤£¤¥ <br />
<br />
<br />
<br />
§¦¢¨<br />
<br />
¦¢¨ <br />
©<br />
©<br />
¦¢¨ <br />
23/04/04 8/25<br />
90%<br />
...<br />
20%<br />
energy<br />
energy<br />
...<br />
threshold 2<br />
threshold 1<br />
ef<strong>for</strong>t 12<br />
start<br />
attack<br />
attack<br />
start end<br />
ef<strong>for</strong>t 23<br />
...<br />
end<br />
time<br />
time
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
3.1.2.2 Log-Attack Time (mpeg7:LogAttackTime) DT.g_lat<br />
$<br />
=<br />
$<br />
¢¡ −<br />
3.1.2.3 Temporal increase (cuidado:TemporalIncrease) DT.g_incr<br />
3.1.3 Sustain part<br />
•<br />
•<br />
3.1.3.1 Decrease part: Temporal decrease (cuidado:TemporalDecrease) DT.g_decr<br />
$<br />
α<br />
<br />
<br />
= ⋅ −α<br />
<br />
<br />
−<br />
<br />
¨©<br />
<br />
><br />
<br />
¨©<br />
3.1.3.2 Sustain part: Energy Modulation and Fundamental frequency modulation<br />
(mpeg7:AudioPower ScalableSeriesType element name="Modulation")<br />
(mpeg7:AudioFUndamentalFrequency ScalableSeriesType element name="Modulation")<br />
$<br />
23/04/04 9/25<br />
<br />
¢£¥¤§¦
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
3.1.4 Example<br />
0<br />
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2<br />
Dlat: -0.53981 - threshold: 0.15 - Dincr: 3.265 - Ddecr: -0.28535<br />
15000<br />
23/04/04 10/25<br />
1<br />
0.5<br />
10000<br />
5000<br />
F:\data\class\sol\sust\bowedstring\alto\mf\alto\_a\_gref\_mf\_si3\_12.wav<br />
0<br />
0 1 2 3 4 5 6 7 8 9 10<br />
incr (r-) incr2(r--) desc (r-)<br />
1<br />
0.5<br />
satt_posn eatt_posnmaxenv_posn<br />
0<br />
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2<br />
; < - 2 # # 6<br />
4# 5 % % 8 " 8<br />
4, 5 %<br />
41 5 9 2 (<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
0 0.5 1 1.5 2 2.5<br />
MODam: 0.060872 - MODfr: 5.3833<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
-0.2<br />
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2<br />
0.015<br />
0.01<br />
0.005<br />
envelop-v<br />
polyfit<br />
hatenvelop-v<br />
fft(envelop v -polyfit)<br />
0<br />
0 5 10 15 20 25 30 35 40 45 50<br />
= %<br />
4# 5 % % 8<br />
4, 5 % % 8<br />
41 5 8
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
3.2 Others<br />
3.2.1 Temporal centroid (mpeg7:TemporalCentroid) DT.g_tc<br />
$<br />
¡¤£<br />
=<br />
$<br />
¢<br />
¢<br />
¡<br />
¡<br />
⋅<br />
¡<br />
3.2.2 Effective Duration (cuidado:TemporalEffectiveDuration) DT.g_ed<br />
$<br />
4 Instantaneous temporal <strong>features</strong><br />
threshold<br />
4.1 Auto-correlation (cuidado:AudioZcr) DT.i_xcorr_m<br />
£<br />
<br />
=<br />
$<br />
<br />
$<br />
§<br />
©<br />
− −<br />
<br />
¨<br />
=<br />
¦<br />
¥<br />
<br />
<br />
⋅<br />
<br />
<br />
+<br />
<br />
23/04/04 11/25<br />
Amplitude<br />
Amplitude<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
energy<br />
-0.2<br />
0 200 400<br />
Time<br />
600 800<br />
250<br />
200<br />
150<br />
100<br />
50<br />
effective duration<br />
1<br />
signal xcorr<br />
Amplitude<br />
0.5<br />
0<br />
-0.5<br />
-20 -10 0<br />
Time<br />
10 20<br />
0<br />
0 1000 2000 3000<br />
Frequency<br />
4000 5000 6000<br />
4 5<br />
8 -<br />
signal<br />
xcorr<br />
4 - 5 4 - 5 -<br />
time
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
4.2 Zero-crossing rate (cuidado:AudioXcorr) DT.i_zcr_v<br />
$<br />
5 Energy <strong>features</strong><br />
' > - ? / + '<br />
8<br />
+ > - ? 7 0 =<br />
8<br />
5.1 Total Energy (mpeg7:AudioPower) DE.i_tot_v<br />
5.2 Harmonic Part Energy (cuidado:AudioHarmonicPower) DE.i_harmo_v<br />
$<br />
5.3 Noise Part Energy (cuidado:AudioNoisePower) DE.i_noise_v<br />
$<br />
23/04/04 12/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
6 Spectral <strong>features</strong><br />
6.1 Spectral shape <strong>description</strong><br />
6.1.1 Spectral centroid (mpeg7:AudioSpectrumCentroid) DS.i_sc_v<br />
μ =<br />
•<br />
⋅<br />
¡<br />
δ<br />
£¥¤§¦©¨ ¢<br />
• ¡ =<br />
=<br />
<br />
¡<br />
¡<br />
6.1.2 Spectral spread (mpeg7:AudioSpectrumSpread) DS.i_ss_v<br />
σ = − μ<br />
⋅<br />
¡<br />
δ<br />
<br />
6.1.3 Spectral skewness (cuidado:AudioSpectrumSkewness) DS.i_skew_v<br />
= − μ ⋅ δ<br />
•<br />
•<br />
•<br />
•<br />
•<br />
<br />
¡<br />
<br />
=<br />
γ <br />
σ<br />
<br />
<br />
23/04/04 13/25<br />
¢<br />
¢<br />
0 . 0 9<br />
0 . 0 8<br />
0 . 0 7<br />
0 . 0 6<br />
0 . 0 5<br />
0 . 0 4<br />
0 . 0 3<br />
0 . 0 2<br />
0 . 0 1<br />
0 . 0 2 5<br />
0 . 0 2<br />
0 . 0 1 5<br />
0 . 0 1<br />
0 . 0 0 5<br />
0 . 0 2 5<br />
0 . 0 2<br />
0 . 0 1 5<br />
0 . 0 1<br />
0 . 0 0 5<br />
m e a n : 7 . 8 7 2 e - 0 1 7 s td : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u r t: 3<br />
d a ta<br />
g a u s s f i t<br />
0<br />
- 5 0 - 4 0 - 3 0 - 2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0<br />
m e a n : 1 6 .6 7 s td : 2 3 . 5 7 1 4 s k e w : - 0 . 5 6 5 6 9 k u r t: 2 . 4<br />
d a t a<br />
g a u s s fi t<br />
0<br />
- 5 0 -4 0 - 3 0 -2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0<br />
m e a n : - 1 6 .6 7 s t d : 2 3 . 5 7 1 4 s k e w : 0 . 5 6 5 6 9 k u r t : 2 . 4<br />
d a t a<br />
g a u s s f i t<br />
0<br />
- 5 0 - 4 0 - 3 0 - 2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
6.1.4 Spectral kurtosis (cuidado:AudioSpectrumKurtosis) DS.i_kurto_v<br />
= − μ ⋅ δ ¢<br />
•<br />
•<br />
•<br />
•<br />
•<br />
¢<br />
¡<br />
¡<br />
γ =<br />
σ £<br />
¢<br />
¢<br />
0<br />
- 5 0 - 4 0 - 3 0 - 2 0 -1 0 0 1 0 2 0 3 0 4 0 5 0<br />
23/04/04 14/25<br />
0 . 0 9<br />
0 . 0 8<br />
0 . 0 7<br />
0 . 0 6<br />
0 . 0 5<br />
0 . 0 4<br />
0 . 0 3<br />
0 . 0 2<br />
0 . 0 1<br />
0 . 0 2 5<br />
0 . 0 2<br />
0 . 0 1 5<br />
0 . 0 1<br />
0 . 0 0 5<br />
m e a n : 7 . 8 7 2 e -0 1 7 s t d : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u rt : 3<br />
m e a n : - 2 . 1 5 9 7 e -0 1 5 s td : 2 8 . 8 7 0 4 s k e w : 3 . 1 2 0 4 e - 0 1 6 k u r t: 1 .8<br />
d a t a<br />
g a u s s f it<br />
d a ta<br />
g a u s s f i t<br />
- 1 0 -8 - 6 - 4 - 2 0 2 4 6 8 1 0<br />
1 . 2<br />
1<br />
0 . 8<br />
0 . 6<br />
0 . 4<br />
0 . 2<br />
m e a n : 0 . 0 0 4 9 9 9 8 s t d : 1 . 4 1 4 2 s k e w : 5 .3 0 3 2 e - 0 0 7 k u r t: 6 .0 0 0 3<br />
d a t a<br />
g a u s s f it<br />
0<br />
- 1 0 -8 - 6 - 4 - 2 0 2 4 6 8 1 0<br />
6.1.5 Spectral slope (cuidado:AudioSpectrumSlope) DS.i_slope_v<br />
$<br />
$ ¤¦¥¨§© £ = ⋅ +<br />
£<br />
=<br />
© ¥ ¡ ¦<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
−<br />
−
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
6.1.6 Spectral decrease (cuidado:AudioSpectrumDecrease) DS.i_decr_v<br />
$<br />
$<br />
¢¦ ¤ ¤§¦ © ¦<br />
=<br />
¨<br />
=<br />
£<br />
<br />
¨<br />
= £<br />
£<br />
<br />
¡<br />
¡<br />
−<br />
−<br />
<br />
£<br />
6.1.7 Spectral roll-<strong>of</strong>f (cuidado:AudioSpectrumRollOff) DS.i_roll<strong>of</strong>f_v<br />
$<br />
$<br />
¤ §©¨ <br />
=<br />
<br />
¥ ¤ <br />
¦<br />
¤ <br />
¦<br />
¦<br />
<br />
6.2 Temporal variation <strong>of</strong> spectrum<br />
/ 4# 5 %<br />
. " ; 0 @ - .<br />
8 4 5 8<br />
. " ; 0 @ - .<br />
8<br />
6.2.1 Temporal variation <strong>of</strong> spectrum: spectral variation (cuidado:AudioSpectrumVariation)<br />
DS.i_var_v<br />
$<br />
=<br />
$<br />
−<br />
<br />
−<br />
¤ ¥¦ ¥¤¦<br />
<br />
<br />
<br />
<br />
<br />
<br />
−<br />
⋅<br />
<br />
<br />
23/04/04 15/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
6.3 Global spectral shape <strong>description</strong><br />
6.3.1 Mel Frequency Cepstral Coefficients (MFCC) (cuidado:AudioMFCC) DP.i_MFCC_m<br />
$<br />
$<br />
s(n) FFT MelBand<br />
MFCC<br />
DCT<br />
-, ( - -, $<br />
¦§ ¨¡©£¥£<br />
¨©¥<br />
¢¡¤£¥£ ∂<br />
=<br />
∂ <br />
<br />
¨©¥ ∂ <br />
=<br />
∂<br />
7 Harmonic <strong>features</strong><br />
Log<br />
-20<br />
0 1000 2000 3000 4000 5000 6000<br />
1<br />
Frequency<br />
23/04/04 16/25<br />
Log-amplitude<br />
Log-amplitude<br />
Value<br />
-5<br />
-10<br />
-15<br />
0<br />
-1<br />
-2<br />
spectrum<br />
mid-ear spectrum<br />
Mel band spectrum<br />
MFCC spectrum<br />
-3<br />
0 5 10 15 20 25<br />
5<br />
Mel band<br />
MFCC<br />
0<br />
-5<br />
-10<br />
0 2 4 6<br />
MFC coefficient<br />
8 10 12<br />
0 4# 5 -<br />
4 5 ,<br />
,<br />
4 5 ,<br />
7.1.1 Fundamental frequency (mpeg7:AudioFundamentalFrequency) DH.i_f0_v<br />
7.1.2 Noisiness (mpeg7:AudioHarmonicity) DH.i_noisiness_v<br />
<br />
=
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
7.1.3 Inharmonicity (cuidado:AudioInharmonicity) DH.i_inharmo_v<br />
=<br />
¡ ¡<br />
¡ £¢ ¤<br />
<br />
<br />
<br />
−<br />
<br />
<br />
¡<br />
<br />
¤<br />
¡<br />
f0 2 f0 3 f0 4 f0 5 f0 6 f0 7 f0<br />
f(1) f(2) f(3) f(4) f(5) f(6)<br />
frequency<br />
23/04/04 17/25<br />
energy<br />
3 $<br />
( 8<br />
2<br />
7.1.4 Harmonic Spectral Deviation (mpeg7:HarmonicSpectralDeviation)DH.i_devs_v<br />
¤ ¦<br />
§ ©§<br />
<br />
= ¥<br />
©<br />
( − )<br />
¦ <br />
¤<br />
¦ ¤<br />
<br />
¡<br />
¦¨§<br />
¡<br />
¥<br />
Amplitude<br />
0.18<br />
0.16<br />
0.14<br />
0.12<br />
0.1<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
Spectral deviation: 0.15374<br />
spectral envelop<br />
harmonics<br />
0<br />
0 2 4 6 8 10<br />
Frequency [harm number]<br />
7 )<br />
8 8
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
7.1.5 Odd to Even Harmonic Energy Ratio (cuidado:HarmonicSpectralOERatio):<br />
DH.i_oeratio_v<br />
$<br />
¡ §£¢<br />
$<br />
=<br />
¥<br />
¥<br />
=<br />
=<br />
¤<br />
¤<br />
¥ ¤<br />
¤ ¤<br />
¤<br />
¤<br />
<br />
<br />
¤<br />
¤<br />
¡<br />
¡<br />
0<br />
0 5 10<br />
Frequency [harmonic number]<br />
15 20<br />
23/04/04 18/25<br />
Amplitude<br />
0.045<br />
0.04<br />
0.035<br />
0.03<br />
0.025<br />
0.02<br />
0.015<br />
0.01<br />
0.005<br />
Odd/even harmonic energy ratio: 3.2431<br />
: % 8<br />
7.1.6 Tristimulus (cuidado:HarmonicSpectralTristimulus): DH.i_tri*_v<br />
$<br />
¨<br />
¨<br />
§¦ <br />
$<br />
= ¥<br />
=<br />
= ¥<br />
<br />
=<br />
¤ © ¨<br />
¥<br />
<br />
<br />
<br />
<br />
¡<br />
+<br />
¥<br />
¡<br />
¡<br />
<br />
<br />
¡<br />
+<br />
<br />
0.4<br />
0.35<br />
0.3<br />
0.25<br />
0.2<br />
0.15<br />
0.1<br />
0.05<br />
tri1: 0.49442 tri2: 0.45368 tri3: 0.0519<br />
odd harmonic<br />
even harmonic<br />
tristimulus1<br />
tristimulus2<br />
tristimulus3<br />
0<br />
0 5 10 15 20<br />
; 2 (<br />
( 2
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
8 Perceptual <strong>features</strong><br />
8.1 Features<br />
8.1.1 Total Loudness and specific loudness (cuidado:AudioLoudness): DP.i_loud_v<br />
$<br />
<br />
=<br />
©<br />
=<br />
§<br />
©<br />
¤¦¥<br />
¨<br />
§<br />
¦ <br />
<br />
<br />
¦¦<br />
<br />
¢¡¤¡§¦ £ ¤<br />
¡¤¡¦<br />
8.1.2 Relative Specific Loudness (cuidado:AudioRelativeSpecificLoudness):<br />
DP.i_specloudnorm_m<br />
<br />
<br />
=<br />
<br />
<br />
<br />
8.1.3 Sharpness (cuidado:AudioSharpness) DP.i_sharp_v<br />
<br />
=<br />
⋅<br />
¦¦¦<br />
<br />
=<br />
<br />
⋅<br />
<br />
<br />
<br />
¤¡§¦ ¡<br />
= <<br />
<br />
<br />
<br />
<br />
=<br />
<br />
⋅<br />
⋅<br />
<br />
<br />
<br />
<br />
<br />
<br />
8.1.4 Spread (cuidado:AudioSpread) DP.i_spread_v<br />
<br />
<br />
<br />
=<br />
−<br />
<br />
<br />
<br />
<br />
≥<br />
23/04/04 19/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
9 Various <strong>features</strong><br />
9.1 Spectral Flatness/Crest measure (mpeg7:AudioSpectrumFlatness) DP.sfm_m<br />
<br />
•<br />
•<br />
•<br />
•<br />
<br />
&<br />
<br />
<br />
<br />
<br />
&<br />
=<br />
=<br />
8 & , #<br />
= ⋅ <br />
<br />
=<br />
* $<br />
£ ¤ ¥<br />
£<br />
<br />
<br />
<br />
∈<br />
©¨<br />
∏<br />
<br />
∈<br />
<br />
<br />
¦¨§¨©¨<br />
©¨¨<br />
<br />
<br />
¦¨§¨©¨<br />
¢<br />
¡<br />
<br />
( ( ∈ ) )<br />
¨<br />
<br />
−<br />
<br />
<br />
<br />
¨<br />
∈<br />
<br />
<br />
¨¨ ¨¨¨<br />
<br />
<br />
<br />
<br />
<br />
( − μ)<br />
<br />
<br />
<br />
=<br />
23/04/04 20/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
10 Temporal modeling<br />
10.1.1 Mean<br />
<br />
=<br />
¡¨¢©¤§¦ <br />
¡£¢¥¤§¦<br />
10.1.2 Variance<br />
( )<br />
£ <br />
<br />
= <br />
−<br />
10.1.3 Deviation<br />
¨ © © ©<br />
<br />
© £<br />
=<br />
+ <br />
+<br />
<br />
<br />
<br />
( + − )<br />
©<br />
10.1.4 Temporal modeling an mpeg-7 <strong>audio</strong> scalable series<br />
mpeg7::scalableseries. weight<br />
scalableseries AudioLoudnessType<br />
mpeg7:scalableseries numOfElements=1 element name<br />
Element Name Mpeg-7<br />
yes<br />
Mean<br />
Yes<br />
Variance<br />
Extension<br />
Derivative<br />
extension<br />
Modulation<br />
23/04/04 21/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
©©<br />
¢ ©¢©¨¢<br />
¢¡¤£¦¥¨§©¡¥§©¡©£<br />
¥¨©¢£<br />
£¦¢<br />
©©§©<br />
¡ ¤<br />
©¡¥¨§¥ ¡<br />
©¤<br />
23/04/04 22/25<br />
§¡<br />
§©¢ §©¡<br />
¤©¢ §¥ <br />
¤ §¥ ¡<br />
<br />
¢ ¢©©¢<br />
¢ ¢©© ¢<br />
¢ ¢©©¢ ¢ ¢¢<br />
¢ ¢©© <br />
' = # $ ! * " #
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
11 List <strong>of</strong> all descriptors<br />
LLD List<br />
frame<br />
based<br />
number <strong>of</strong><br />
<strong>features</strong> acronym xml tag<br />
Temporal Features<br />
Global Temporal Features<br />
Log Attack Time n 1 DTg_lat mpeg7:LogAttackTime<br />
Temporal Increase n 1 DTg_incr cuidado:TemporalIncrease<br />
Temporal Decrease n 1 DTg_decr cuidado:TemporalDecrease<br />
Temporal Centroid n 1 DTg_tc mpeg7:TemporalCentroid<br />
Effective Duration n 1 DTg_ed cuidado::TemporalEffectiveDuration<br />
Instantaneous Temporal Features<br />
Signal Auto-correlation function y 12 DTi_xcorr_m cuidado:AudioXcorr<br />
Zero-corssing rate y 1 DTi_zcr cuidado:AudioZcr<br />
Energy Features<br />
Total energy y 1 DEi_tot_v mpeg7:AudioPower<br />
Total energy Modulation (frequency, amplitude) n 2 DTg_mod_fr, DTg_mod_am ScalableSeriesType element name="Modulation"<br />
Total harmonic energy y 1 DEi_harmo_v cuidado:AudioHarmonicPower<br />
Total noise energy y 1 DEi_noise_v cuidado:AudioNoisePower<br />
Spectral Features<br />
Spectral Shape<br />
Spectral centroid y 6 DSi_sc_m mpeg7:AudioSpectrumCentroid (mpeg7:SpectralCentroid)<br />
Spectral spread y 6 DSi_ss_m mpeg7:AudioSpectrumSpread<br />
Spectral skewness y 6 Dsi_skew_m cuidado:AudioSpectrumSkewness<br />
Spectral kurtosis y 6 Dsi_kurto_v cuidado:AudioSpectrumKurtosis<br />
Spectral slope y 6 Dsi_slope_v cuidado:AudioSpectrumSlope<br />
Spectral decrease y 1 Dsi_decs_c cuidado:AudioSpectrumDecrease<br />
Spectral roll<strong>of</strong>f y 1 Dsi_roll<strong>of</strong>f_v cuidado:AudioSpectrumRollOff<br />
Spectral variation y 3 Dsi_variation_v cuidado:AudioSpectrumVariation<br />
Global spectral shape <strong>description</strong><br />
MFCC y 12 DPi_mfcc_m cuidado:AudioMFCC<br />
Delta MFCC y (post) 12 DPi_Dmfcc_m<br />
Delta Delta MFCC y (post) 12 DPi_DDmfcc_m<br />
Harmonic Features<br />
Fundamental frequency y 1 DHi_f0_v mpeg7:AudioFundamentalFrequency<br />
Fundamental fr. Modulation (frequency, amplitude) n 2 F0 Mod AM, FR ScalableSeriesType element name="Modulation"<br />
Noisiness y 1 DHi_noisiness_v mpeg7:AudioHarmonicity<br />
Inharmonicity y 1 DHi_inharmo_v cuidado:AudioInharmonicity<br />
Harmonic Spectral Deviation y 3 DHi_devs_v mpeg7:HarmonicSpectralDeviation<br />
Odd to Even Harmonic Ratio y 3 Dhi_oeratio_v cuidado:HarmonicSpectralOERatio<br />
Harmonic Tristimulus y 9 Dhi_tri_v cuidado:HarmonicSpectralTristimulus<br />
Harmonic Spectral Shape<br />
HarmonicSpectral centroid y 6 DHi_sc_m mpeg7:HarmonicSpectralCentroid<br />
HarmonicSpectral spread y 6 DHi_ss_m mpeg7:HarmonicSpectralSpread<br />
HarmonicSpectral skewness y 6 DHi_skew_m cuidado:HarmonicSpectralSkewness<br />
HarmonicSpectral kurtosis y 6 DHi_kurto_v cuidado:HarmonicSpectralKurtosis<br />
HarmonicSpectral slope y 6 DHi_slope_v cuidado:HarmonicSpectralSlope<br />
HarmonicSpectral decrease y 1 DHi_decs_c cuidado:HarmonicSpectralDecrease<br />
HarmonicSpectral roll<strong>of</strong>f y 1 DHi_roll<strong>of</strong>f_v cuidado:HarmonicSpectralRollOff<br />
HarmonicSpectral variation y 3 DHi_variation_v mpeg7:HarmonicSpectralVariation<br />
Perceptual Features<br />
Loudness y 1 DPi_loud_v AudioLoudness<br />
RelaitveSpecific Loudness y 24 DPi_specloud_m cuidado:AudioRelativeSpecificLoudness<br />
Sharpness y 1 DPi_sharp_v cuidado:AudioSharpness<br />
Spread y 1 DPi_spread_v cuidado:AudioSpread<br />
Perceptual Spectral Envelope Shape<br />
Perceptual Spectral centroid y 6 DPi_sc_m cuidado:AudioFilterbankCentroid<br />
Perceptual Spectral spread y 6 DPi_ss_m cuidado:AudioFilterbankSpread<br />
Perceptual Spectral skewness y 6 DPi_skew_m cuidado:AudioFilterbandSkewness<br />
Perceptual Spectral kurtosis y 6 DPi_kurto_v cuidado:AudioFilterbankKurtosis<br />
Perceptual Spectral Slope y 6 DPi_slope_v cuidado:AudioFilterbankSlope<br />
Perceptual Spectral Decrease y 1 DPi_decs_c cuidado:AudioFilterbankDecrease<br />
Perceptual Spectral Roll<strong>of</strong>f y 1 DPi_roll<strong>of</strong>f_v cuidado:AudioFilterbankRoll<strong>of</strong>f<br />
Perceptual Spectral Variation y 3 DPi_variation_v cuidado:AudioFilterbankVariation<br />
Odd to Even Band Ratio y 3 DP_ioeratio_v cuidado:AudioFilterbankOERatio<br />
Band Spectral Deviation y 3 DPi_devs_v cuidado:AudioFilterbankDeviation<br />
Band Tristimulus y 9 DPi_tri_v cuidado:AudioFilterbankTristimulus<br />
Various <strong>features</strong><br />
Spectral flatness y 4 DPi_sfm_m mpeg7:AudioSpectrumFlatness<br />
Spectral crest y 4 DPi_scm_m cuidado:AudioSpectrumCrest<br />
Total Number <strong>of</strong> Features 166<br />
23/04/04 23/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
12 Acknowledgement<br />
13 References<br />
£ ¤£¦¥¨§©¥¨ ¢£ ¨ ¡ £ ¦¡ £ ¥ £ £ ¦ ¦£ £¤£¨¨¡¦ £ £ ¤£¡¤¥¥ ¡ £ ¦¥¨£¥ ¢£ ¡¨ £ ¥ £ £ £<br />
¨£¢¡<br />
¤£¦ ¤¦¥<br />
<br />
£ £¦¥¢§©¢¥¢§£ ¡¨ ¡ £ ¥ £ £ £¥ £ ¨ ¡ £ ¦ £ ¡¤¨ £ ¨¡¤£ £ £ £ ¡ ¥ ¤¡ ¨¤£¡¤¥¥ ¡ £ <br />
¨£¢¡<br />
¥¡¨¡¤¥ £ <br />
<br />
= 0 §¢©¢©¢¨©¥<br />
¡ ¡¥ ¥ ¤¨ ¡¤¥¢§©¢¢¥¨ £ £¥ ¤¨ ¢¥©£¨ ¡£ £ £ ¦¥¡¤ £ ¦¡ £ ¤¨£ £¡¥<br />
<br />
§ ££ ¡ £ © ¥<br />
¢<br />
£ ¥§©¢¢¥ ¤ £ £ ¡ ¥¨¡ £ ¤¡ ¥£ £ ¡ £ ¡¡¤ £ £ £ £¨¡¤¥ ¨ © £ ¡¤¨ ¤¨ ¢ ¥<br />
£ ¥ £ ¥ £¡§©¢¢¥ £ ¡ £ ¥¨¡¡ £ ¤¡¤ £ £ £ ¨¤ £ £ ¦ ¦ ¦¡ ¥££ ¥¤¨ £ £ <br />
<br />
¤¥ £ £ ¦¥ ¢ § ££ ¡ £ ¦¥<br />
<br />
§£ £ ¡ £ ¡ ¡¤¨ ¡ £¥ £ £¥ ¡ £¨¡ £ ¨¥ ¦£ ¡ ¢¤¢ <br />
££¡£¦¥§©¢¢¥<br />
£ £¥ ¢ ¥<br />
¨£¤¡¤¡<br />
£¡ £ ¡ £ ¥§¢¨©¥ ¥¡ ¦ ¨¡ £¦¡¥£ ¤¨ ¤¨¤¡£ £ ¨ ¦£ ¥<br />
¥£¥¥ £¦¥ ¥¢ ¡¤ ¥¢§©¢¥¢§¤¨¤¤¨ £ £ ¨¡ ¦¡¤¢£ £ ¤£ ¡¡¤¥ ¢¤¢ £ ¡ £¦ ¡ ¡¤<br />
¢¨<br />
£ ¥ £ £ ¥£ ¥ ¦¡¥ ££¨ £ ¡ ¥ ¡<br />
<br />
/ ¢¢¢¨¥<br />
¤¨ £ ¥ £ ¥ §©¥ ©¨¥ ¢ £ ¨ ¡ £ ¡ £ ¥ £ £ ¤¡¨ £ §¨¡¤£ £ £ £ ¨£¥¥©¢¦¥¡¤¡¤¥ ¥£¥<br />
<br />
¡¤¨ ¥ £¥<br />
¤¨ ¦ ¥ ¥ ¦¥ ¡¤¥§©¢¨¥ ¤ £ £ £¥ ¡ £ £ £ £ ¤¡ £¦¡¥£¨ ¡¤¨¡ ¦ ¤¨ ¡<br />
<br />
£ £ ¦ ¨¡¤¦¥©¥¡¤¡¤¥ ¥£¥£¥ ¡¤¨ ©¥ ¢£ ¥§£ £ ¥£ £ ¤£¦ ¤¡¡ ¥ £ £ £ © ¥<br />
<br />
¦ ¥¢§©¢¢¥¢ £¦¡¥£¨¢¦¥¡¨¡ ¤ £ £ £¥ ¥¦¨¡¤¥££ £ ¡¤ £ ¤¨ £ £ ¡¤¥ £¥ £<br />
££¨¡ ¡¤¨¡¤<br />
¥¢£¥ ¢¢¢¥<br />
£<br />
/ 0<br />
§¢¢¥ ¢£ ¥£¨ £ £ ¡¥ £ ££ ¡ §£ £ ¡ £ ¡¤¨ £ £ ¢£ ¡¨¥¤¡¢ ¤¨ £¥ ¢ ¢ §£ §<br />
<br />
§¥ ¢ ¢ § ¢ ©¢ ¢¥<br />
©<br />
¥§¢¥ £ § ¥ £ £ £¥ ¨¤¦¡ ¢£ ¨ ¡ £ ¤¡¤ £ ¡¤¨¤¨¥ § ¥ ¡¨¡ ¥<br />
¡¤¡¤¡¤¨<br />
¡¨ £ ¤ £ £ ¥ ©©¦¥ §£ £ ¡ £ £ £ ¢¡¡¢£¨© ¥<br />
¢¤£<br />
¥ ¥ ¡¤ ¥§¢¢¥ ¢£ ¨¤ ¡ £ £ £ ¡¤¨ £ £ £ ¦¥¡ ¤£ £ ¡£¥ ¥ ¢ § § ¡¤¨ ¦£ <br />
¡¤¡¤¡¤¨<br />
£ ¥ ¢¡¤¨<br />
¥ £ ¥ £¦¡¤§¢¢¥ £ ¤ ¡¡¤ £ £ ¦¡¤¨ £¨ ¥£¨£ £ §¨ ¥ £ £ ¥ ¢ § § ¢£¡£¨<br />
¡¤¡¤¡¤¨<br />
£ ¥ ¡¢¡¦¡<br />
¥ £ ¥£¦¡¤§¢¢¥ ¡¤¨¥ ¦ £ ¨¡¤¡¡ ¥ ¢£ ¡¨ £ ¦ ¦ ¤ £ £ ¥£¨¢¦¥¡ §¦ ¥ £ £ £¥<br />
¡¤¡¤¡¤¨<br />
¨¦¡ ¢£ ¨¤ ¡ £ ¡¥ £ £ ¦£ £ ©¥<br />
<br />
£ ¡¤¨¤¥ £ ¥£ £ §©¢¢¥ £ ¡ £ £¥ ¡¡¥¨¡£ £ £ £ ¥¢¡¡ £¨¨¡ £ ¤¡ ¥<br />
¥¡ ¨¡¤¨¤¤¥ £ ¥ £ ¡¤§©¢¢¥ §£ £ ¨¤ £ £ £ ¡ £ £ £¥¨£¦ ¥¡¨¡ ¡¤¡¥ ¨ £ £¨¥<br />
<br />
§ £ ¥ ¢¡¨ £ ¥<br />
¢<br />
23/04/04 24/25
G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />
£ £¦¥ £ £ §©¢¨¥£ £ ¨ £ ¥©£¨ £ £ ¡ £ £ ¥ ¡ ¡ ¨ ¦¡¦¥ ¨¤¡£ £ <br />
¡¤¨¨¥<br />
£ ¥ §<br />
£¤¥ ¤¥ ¡ ¥§©¢¥ § ¥ £ £ ¡¨¥ £ ¨¡¤¨ ¡ £¥ £¥ §§ £ £¦£ £¥ ¡ <br />
<br />
£ ¥ ¥ ¨¥¥ ££ £ §§ ¨¡¤ §£ ¢¢¢¥<br />
$<br />
¥£¦¡¤¥¤¥§¢©¥¨£¡¤ ¨ £ ¡¤¡¤¨ ¡¨¡ £ ¡ ¥¤ ¤¨ ¢ ¨ ¥<br />
¡ ¡ ¡¤¨¤¥§©¢¢¥¨£¡¦¨¡ ¥£¨¤ £ £ £ ¡ £¥¡ £¨ ¨ ¡ £ £ ¦¦¥ £ ¥<br />
¡ ¡ ¡¤¨¤¥§©¢¢¥¥£ £ ¤¥ ¡¨ ¦£ ¨ £ ¡¨¢¡¨¥<br />
¡ ¡¤¨¤¥ £ ¤¥¡¤¨¤¥¤¨§©¨¢¥ £ ¡ ¨¡¤ £ £ ¥£¨¤¨ £ ¨¤¡ £ ¨ £ ¢¡ ¦¥¢¥ £ ¤ £ £ £¥<br />
¡<br />
£ ¥ £ <br />
¥¨¡¡<br />
3 : ©¢¢©¢¥<br />
23/04/04 25/25