22.03.2013 Views

A large set of audio features for sound description ... - WWW Ircam

A large set of audio features for sound description ... - WWW Ircam

A large set of audio features for sound description ... - WWW Ircam

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

1 Introduction<br />

1.1 Features taxonomy<br />

¦¤ ¤<br />

¢¡¤£¦¥§¥©¨£¢¡¤¡¤¡¤¨<br />

§¤¤¤¤<br />

©<br />

peeters@ircam.fr<br />

http://www.ircam.fr/<br />

¤¤ ¤¤©¤<br />

¦¤ ¢©©¤<br />

•<br />

•<br />

¤<br />

<br />

<br />

¦ <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

23/04/04 1/25<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

¤ <br />

! "


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

•<br />

•<br />

•<br />

•<br />

¦©¤<br />

¡<br />

©<br />

¤© ¤©¤<br />

¢¤£ ¦¥¡¨§© ¢ ¨£¡¤¨¦¡ ¡¤£¥¥¡¨¡¤¥¡¡ £ ¡ ¡ £ ¡ ¦£ £ ¥©¡¨¡¤¨¡¤¡ £ ¥¡<br />

• # $<br />

• #<br />

• % $<br />

• & $<br />

• ) $<br />

• * $<br />

•<br />

1.2 Organization <strong>of</strong> the paper<br />

<br />

<br />

<br />

<br />

<br />

23/04/04 2/25<br />

<br />

<br />

¦<br />

<br />

¦ <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

¦ <br />

<br />

¦<br />

<br />

¦ <br />

' ( ( (<br />

!<br />

<br />

<br />

¦<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

¦


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

2 Pre-computing<br />

•<br />

•<br />

•<br />

•<br />

2.1 Energy envelop<br />

$<br />

<br />

2.2 Short-Time Fourier Trans<strong>for</strong>m<br />

2.3 Sinusoidal Harmonic modeling<br />

23/04/04 3/25<br />

¢¡ £¥¤§¦©¨<br />

§©<br />

¥§¥<br />

¤ £ <br />

¤ ¨ <br />

¤ £ ¤§<br />

¢ ¥§©§ © ¥§© ©<br />

§©§©§©§<br />

<br />

§© © <br />

§ §¥ <br />

©¦©¨ ¨<br />

¥ ¦©¨ ©<br />

¡ © <br />

§© © © <br />

¥§©§ ©<br />

<br />

© <br />

<br />

§§©§©§©§<br />

<br />

© §<br />

© <br />

§©§©§¥§<br />

©¥<br />

© <br />

§© §<br />

§¥ © ¢<br />

©¥<br />

©


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

2.4 Perceptual model<br />

•<br />

•<br />

2.4.1 Mid-ear filtering<br />

2.4.2 Mel scale<br />

$<br />

• =<br />

<br />

• = ⋅ <br />

+<br />

<br />

<br />

<br />

<br />

<br />

<br />

£¢ ¤¦¥§©¨ ¡<br />

¨ ¨¨¨<br />

23/04/04 4/25<br />

¤¦¥§©¨<br />

Amplitude [db20]<br />

0<br />

-20<br />

-40<br />

-60<br />

-80<br />

-100<br />

10 -4<br />

-120<br />

2<br />

1.8<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

10 -2<br />

10 0<br />

10 2<br />

Frequency [Hz]<br />

10 4<br />

+ , - .<br />

Number <strong>of</strong> mel bands: 24<br />

¥ <br />

0 0.5 1 1.5 2 2.5<br />

x 10 4<br />

0<br />

Frequency [Hz]<br />

/ ,<br />

10 6


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

<br />

<br />

<br />

<br />

=<br />

2.4.3 Bark scale<br />

<br />

=<br />

⋅<br />

<br />

<br />

<br />

<br />

$<br />

<br />

<br />

=<br />

<br />

¡<br />

<br />

§<br />

§ <br />

<br />

<br />

<br />

+<br />

<br />

<br />

<br />

=<br />

<br />

=<br />

§<br />

§<br />

<br />

<br />

⋅<br />

<br />

¡<br />

<br />

<br />

<br />

<br />

<br />

¡ £¢¥¤¡§¦©¨ ¨ ¤¡¦<br />

0 0.5 1 1.5 2 2.5<br />

x 10 4<br />

0<br />

Frequency [Hz]<br />

23/04/04 5/25<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

Number <strong>of</strong> bark bands: 24<br />

0 1 2<br />

¡


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

2.5 Amplitude and Frequency scale<br />

2.5.1 Amplitude scales<br />

•<br />

•<br />

•<br />

2.5.2 Frequency scales<br />

•<br />

•<br />

0<br />

0<br />

x 10-3<br />

8<br />

2000 4000<br />

Freq<br />

6000<br />

23/04/04 6/25<br />

Ampl<br />

Log-ampl<br />

0.1<br />

0.05<br />

Power<br />

6<br />

4<br />

2<br />

0<br />

0 2000 4000 6000<br />

Freq<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2000 4000 6000<br />

Freq<br />

Ampl<br />

Log-ampl<br />

0.1<br />

0.05<br />

Power<br />

0<br />

-10<br />

x 10-3<br />

8<br />

-5 0<br />

Log-freq<br />

5<br />

6<br />

4<br />

2<br />

0<br />

-10 -5 0 5<br />

Log-freq<br />

200<br />

150<br />

100<br />

3 &<br />

4 - 5 - . 6 - (<br />

4 - 5 - . 6 (<br />

4 - 5 - . 6 - (<br />

4 - 5 - . 6 - (<br />

4 - 5 - . 6 (<br />

4 - 5 5 - . 6 -<br />

2.6 Descriptors on Spectrum / Harmonic peaks / Bark bands<br />

50<br />

0<br />

-10 -5 0 5<br />

Log-freq


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

3 Global temporal <strong>features</strong><br />

3.1 Envelop characterization<br />

3.1.1 Attack / Decay / Sustain / Release envelop modeling<br />

7<br />

:<br />

attack decay sustain release<br />

7 % 8 $ & 9 8 ( "<br />

!<br />

sustained <strong>sound</strong><br />

non-sustained <strong>sound</strong><br />

attack rest<br />

: % 8 $ 9 8 ( "<br />

!<br />

23/04/04 7/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

3.1.2 Attack part<br />

•<br />

•<br />

3.1.2.1 Estimation <strong>of</strong> the start and end <strong>of</strong> the attack<br />

! $<br />

8 " 2<br />

$<br />

<br />

<br />

<br />

¢¡¤£¤¥ <br />

<br />

<br />

<br />

§¦¢¨<br />

<br />

¦¢¨ <br />

©<br />

©<br />

¦¢¨ <br />

23/04/04 8/25<br />

90%<br />

...<br />

20%<br />

energy<br />

energy<br />

...<br />

threshold 2<br />

threshold 1<br />

ef<strong>for</strong>t 12<br />

start<br />

attack<br />

attack<br />

start end<br />

ef<strong>for</strong>t 23<br />

...<br />

end<br />

time<br />

time


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

3.1.2.2 Log-Attack Time (mpeg7:LogAttackTime) DT.g_lat<br />

$<br />

=<br />

$<br />

¢¡ −<br />

3.1.2.3 Temporal increase (cuidado:TemporalIncrease) DT.g_incr<br />

3.1.3 Sustain part<br />

•<br />

•<br />

3.1.3.1 Decrease part: Temporal decrease (cuidado:TemporalDecrease) DT.g_decr<br />

$<br />

α<br />

<br />

<br />

= ⋅ −α<br />

<br />

<br />

−<br />

<br />

¨©<br />

<br />

><br />

<br />

¨©<br />

3.1.3.2 Sustain part: Energy Modulation and Fundamental frequency modulation<br />

(mpeg7:AudioPower ScalableSeriesType element name="Modulation")<br />

(mpeg7:AudioFUndamentalFrequency ScalableSeriesType element name="Modulation")<br />

$<br />

23/04/04 9/25<br />

<br />

¢£¥¤§¦


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

3.1.4 Example<br />

0<br />

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2<br />

Dlat: -0.53981 - threshold: 0.15 - Dincr: 3.265 - Ddecr: -0.28535<br />

15000<br />

23/04/04 10/25<br />

1<br />

0.5<br />

10000<br />

5000<br />

F:\data\class\sol\sust\bowedstring\alto\mf\alto\_a\_gref\_mf\_si3\_12.wav<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10<br />

incr (r-) incr2(r--) desc (r-)<br />

1<br />

0.5<br />

satt_posn eatt_posnmaxenv_posn<br />

0<br />

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2<br />

; < - 2 # # 6<br />

4# 5 % % 8 " 8<br />

4, 5 %<br />

41 5 9 2 (<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 0.5 1 1.5 2 2.5<br />

MODam: 0.060872 - MODfr: 5.3833<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-0.2<br />

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2<br />

0.015<br />

0.01<br />

0.005<br />

envelop-v<br />

polyfit<br />

hatenvelop-v<br />

fft(envelop v -polyfit)<br />

0<br />

0 5 10 15 20 25 30 35 40 45 50<br />

= %<br />

4# 5 % % 8<br />

4, 5 % % 8<br />

41 5 8


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

3.2 Others<br />

3.2.1 Temporal centroid (mpeg7:TemporalCentroid) DT.g_tc<br />

$<br />

¡¤£<br />

=<br />

$<br />

¢<br />

¢<br />

¡<br />

¡<br />

⋅<br />

¡<br />

3.2.2 Effective Duration (cuidado:TemporalEffectiveDuration) DT.g_ed<br />

$<br />

4 Instantaneous temporal <strong>features</strong><br />

threshold<br />

4.1 Auto-correlation (cuidado:AudioZcr) DT.i_xcorr_m<br />

£<br />

<br />

=<br />

$<br />

<br />

$<br />

§<br />

©<br />

− −<br />

<br />

¨<br />

=<br />

¦<br />

¥<br />

<br />

<br />

⋅<br />

<br />

<br />

+<br />

<br />

23/04/04 11/25<br />

Amplitude<br />

Amplitude<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

energy<br />

-0.2<br />

0 200 400<br />

Time<br />

600 800<br />

250<br />

200<br />

150<br />

100<br />

50<br />

effective duration<br />

1<br />

signal xcorr<br />

Amplitude<br />

0.5<br />

0<br />

-0.5<br />

-20 -10 0<br />

Time<br />

10 20<br />

0<br />

0 1000 2000 3000<br />

Frequency<br />

4000 5000 6000<br />

4 5<br />

8 -<br />

signal<br />

xcorr<br />

4 - 5 4 - 5 -<br />

time


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

4.2 Zero-crossing rate (cuidado:AudioXcorr) DT.i_zcr_v<br />

$<br />

5 Energy <strong>features</strong><br />

' > - ? / + '<br />

8<br />

+ > - ? 7 0 =<br />

8<br />

5.1 Total Energy (mpeg7:AudioPower) DE.i_tot_v<br />

5.2 Harmonic Part Energy (cuidado:AudioHarmonicPower) DE.i_harmo_v<br />

$<br />

5.3 Noise Part Energy (cuidado:AudioNoisePower) DE.i_noise_v<br />

$<br />

23/04/04 12/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

6 Spectral <strong>features</strong><br />

6.1 Spectral shape <strong>description</strong><br />

6.1.1 Spectral centroid (mpeg7:AudioSpectrumCentroid) DS.i_sc_v<br />

μ =<br />

•<br />

⋅<br />

¡<br />

δ<br />

£¥¤§¦©¨ ¢<br />

• ¡ =<br />

=<br />

<br />

¡<br />

¡<br />

6.1.2 Spectral spread (mpeg7:AudioSpectrumSpread) DS.i_ss_v<br />

σ = − μ<br />

⋅<br />

¡<br />

δ<br />

<br />

6.1.3 Spectral skewness (cuidado:AudioSpectrumSkewness) DS.i_skew_v<br />

= − μ ⋅ δ<br />

•<br />

•<br />

•<br />

•<br />

•<br />

<br />

¡<br />

<br />

=<br />

γ <br />

σ<br />

<br />

<br />

23/04/04 13/25<br />

¢<br />

¢<br />

0 . 0 9<br />

0 . 0 8<br />

0 . 0 7<br />

0 . 0 6<br />

0 . 0 5<br />

0 . 0 4<br />

0 . 0 3<br />

0 . 0 2<br />

0 . 0 1<br />

0 . 0 2 5<br />

0 . 0 2<br />

0 . 0 1 5<br />

0 . 0 1<br />

0 . 0 0 5<br />

0 . 0 2 5<br />

0 . 0 2<br />

0 . 0 1 5<br />

0 . 0 1<br />

0 . 0 0 5<br />

m e a n : 7 . 8 7 2 e - 0 1 7 s td : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u r t: 3<br />

d a ta<br />

g a u s s f i t<br />

0<br />

- 5 0 - 4 0 - 3 0 - 2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0<br />

m e a n : 1 6 .6 7 s td : 2 3 . 5 7 1 4 s k e w : - 0 . 5 6 5 6 9 k u r t: 2 . 4<br />

d a t a<br />

g a u s s fi t<br />

0<br />

- 5 0 -4 0 - 3 0 -2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0<br />

m e a n : - 1 6 .6 7 s t d : 2 3 . 5 7 1 4 s k e w : 0 . 5 6 5 6 9 k u r t : 2 . 4<br />

d a t a<br />

g a u s s f i t<br />

0<br />

- 5 0 - 4 0 - 3 0 - 2 0 - 1 0 0 1 0 2 0 3 0 4 0 5 0


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

6.1.4 Spectral kurtosis (cuidado:AudioSpectrumKurtosis) DS.i_kurto_v<br />

= − μ ⋅ δ ¢<br />

•<br />

•<br />

•<br />

•<br />

•<br />

¢<br />

¡<br />

¡<br />

γ =<br />

σ £<br />

¢<br />

¢<br />

0<br />

- 5 0 - 4 0 - 3 0 - 2 0 -1 0 0 1 0 2 0 3 0 4 0 5 0<br />

23/04/04 14/25<br />

0 . 0 9<br />

0 . 0 8<br />

0 . 0 7<br />

0 . 0 6<br />

0 . 0 5<br />

0 . 0 4<br />

0 . 0 3<br />

0 . 0 2<br />

0 . 0 1<br />

0 . 0 2 5<br />

0 . 0 2<br />

0 . 0 1 5<br />

0 . 0 1<br />

0 . 0 0 5<br />

m e a n : 7 . 8 7 2 e -0 1 7 s t d : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u rt : 3<br />

m e a n : - 2 . 1 5 9 7 e -0 1 5 s td : 2 8 . 8 7 0 4 s k e w : 3 . 1 2 0 4 e - 0 1 6 k u r t: 1 .8<br />

d a t a<br />

g a u s s f it<br />

d a ta<br />

g a u s s f i t<br />

- 1 0 -8 - 6 - 4 - 2 0 2 4 6 8 1 0<br />

1 . 2<br />

1<br />

0 . 8<br />

0 . 6<br />

0 . 4<br />

0 . 2<br />

m e a n : 0 . 0 0 4 9 9 9 8 s t d : 1 . 4 1 4 2 s k e w : 5 .3 0 3 2 e - 0 0 7 k u r t: 6 .0 0 0 3<br />

d a t a<br />

g a u s s f it<br />

0<br />

- 1 0 -8 - 6 - 4 - 2 0 2 4 6 8 1 0<br />

6.1.5 Spectral slope (cuidado:AudioSpectrumSlope) DS.i_slope_v<br />

$<br />

$ ¤¦¥¨§© £ = ⋅ +<br />

£<br />

=<br />

© ¥ ¡ ¦<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

−<br />


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

6.1.6 Spectral decrease (cuidado:AudioSpectrumDecrease) DS.i_decr_v<br />

$<br />

$<br />

¢¦ ¤ ¤§¦ © ¦<br />

=<br />

¨<br />

=<br />

£<br />

<br />

¨<br />

= £<br />

£<br />

<br />

¡<br />

¡<br />

−<br />

−<br />

<br />

£<br />

6.1.7 Spectral roll-<strong>of</strong>f (cuidado:AudioSpectrumRollOff) DS.i_roll<strong>of</strong>f_v<br />

$<br />

$<br />

¤ §©¨ <br />

=<br />

<br />

¥ ¤ <br />

¦<br />

¤ <br />

¦<br />

¦<br />

<br />

6.2 Temporal variation <strong>of</strong> spectrum<br />

/ 4# 5 %<br />

. " ; 0 @ - .<br />

8 4 5 8<br />

. " ; 0 @ - .<br />

8<br />

6.2.1 Temporal variation <strong>of</strong> spectrum: spectral variation (cuidado:AudioSpectrumVariation)<br />

DS.i_var_v<br />

$<br />

=<br />

$<br />

−<br />

<br />

−<br />

¤ ¥¦ ¥¤¦<br />

<br />

<br />

<br />

<br />

<br />

<br />

−<br />

⋅<br />

<br />

<br />

23/04/04 15/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

6.3 Global spectral shape <strong>description</strong><br />

6.3.1 Mel Frequency Cepstral Coefficients (MFCC) (cuidado:AudioMFCC) DP.i_MFCC_m<br />

$<br />

$<br />

s(n) FFT MelBand<br />

MFCC<br />

DCT<br />

-, ( - -, $<br />

¦§ ¨¡©£¥£<br />

¨©¥<br />

¢¡¤£¥£ ∂<br />

=<br />

∂ <br />

<br />

¨©¥ ∂ <br />

=<br />

∂<br />

7 Harmonic <strong>features</strong><br />

Log<br />

-20<br />

0 1000 2000 3000 4000 5000 6000<br />

1<br />

Frequency<br />

23/04/04 16/25<br />

Log-amplitude<br />

Log-amplitude<br />

Value<br />

-5<br />

-10<br />

-15<br />

0<br />

-1<br />

-2<br />

spectrum<br />

mid-ear spectrum<br />

Mel band spectrum<br />

MFCC spectrum<br />

-3<br />

0 5 10 15 20 25<br />

5<br />

Mel band<br />

MFCC<br />

0<br />

-5<br />

-10<br />

0 2 4 6<br />

MFC coefficient<br />

8 10 12<br />

0 4# 5 -<br />

4 5 ,<br />

,<br />

4 5 ,<br />

7.1.1 Fundamental frequency (mpeg7:AudioFundamentalFrequency) DH.i_f0_v<br />

7.1.2 Noisiness (mpeg7:AudioHarmonicity) DH.i_noisiness_v<br />

<br />

=


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

7.1.3 Inharmonicity (cuidado:AudioInharmonicity) DH.i_inharmo_v<br />

=<br />

¡ ¡<br />

¡ £¢ ¤<br />

<br />

<br />

<br />

−<br />

<br />

<br />

¡<br />

<br />

¤<br />

¡<br />

f0 2 f0 3 f0 4 f0 5 f0 6 f0 7 f0<br />

f(1) f(2) f(3) f(4) f(5) f(6)<br />

frequency<br />

23/04/04 17/25<br />

energy<br />

3 $<br />

( 8<br />

2<br />

7.1.4 Harmonic Spectral Deviation (mpeg7:HarmonicSpectralDeviation)DH.i_devs_v<br />

¤ ¦<br />

§ ©§<br />

<br />

= ¥<br />

©<br />

( − )<br />

¦ <br />

¤<br />

¦ ¤<br />

<br />

¡<br />

¦¨§<br />

¡<br />

¥<br />

Amplitude<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.1<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

Spectral deviation: 0.15374<br />

spectral envelop<br />

harmonics<br />

0<br />

0 2 4 6 8 10<br />

Frequency [harm number]<br />

7 )<br />

8 8


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

7.1.5 Odd to Even Harmonic Energy Ratio (cuidado:HarmonicSpectralOERatio):<br />

DH.i_oeratio_v<br />

$<br />

¡ §£¢<br />

$<br />

=<br />

¥<br />

¥<br />

=<br />

=<br />

¤<br />

¤<br />

¥ ¤<br />

¤ ¤<br />

¤<br />

¤<br />

<br />

<br />

¤<br />

¤<br />

¡<br />

¡<br />

0<br />

0 5 10<br />

Frequency [harmonic number]<br />

15 20<br />

23/04/04 18/25<br />

Amplitude<br />

0.045<br />

0.04<br />

0.035<br />

0.03<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

Odd/even harmonic energy ratio: 3.2431<br />

: % 8<br />

7.1.6 Tristimulus (cuidado:HarmonicSpectralTristimulus): DH.i_tri*_v<br />

$<br />

¨<br />

¨<br />

§¦ <br />

$<br />

= ¥<br />

=<br />

= ¥<br />

<br />

=<br />

¤ © ¨<br />

¥<br />

<br />

<br />

<br />

<br />

¡<br />

+<br />

¥<br />

¡<br />

¡<br />

<br />

<br />

¡<br />

+<br />

<br />

0.4<br />

0.35<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

tri1: 0.49442 tri2: 0.45368 tri3: 0.0519<br />

odd harmonic<br />

even harmonic<br />

tristimulus1<br />

tristimulus2<br />

tristimulus3<br />

0<br />

0 5 10 15 20<br />

; 2 (<br />

( 2


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

8 Perceptual <strong>features</strong><br />

8.1 Features<br />

8.1.1 Total Loudness and specific loudness (cuidado:AudioLoudness): DP.i_loud_v<br />

$<br />

<br />

=<br />

©<br />

=<br />

§<br />

©<br />

¤¦¥<br />

¨<br />

§<br />

¦ <br />

<br />

<br />

¦¦<br />

<br />

¢¡¤¡§¦ £ ¤<br />

¡¤¡¦<br />

8.1.2 Relative Specific Loudness (cuidado:AudioRelativeSpecificLoudness):<br />

DP.i_specloudnorm_m<br />

<br />

<br />

=<br />

<br />

<br />

<br />

8.1.3 Sharpness (cuidado:AudioSharpness) DP.i_sharp_v<br />

<br />

=<br />

⋅<br />

¦¦¦<br />

<br />

=<br />

<br />

⋅<br />

<br />

<br />

<br />

¤¡§¦ ¡<br />

= <<br />

<br />

<br />

<br />

<br />

=<br />

<br />

⋅<br />

⋅<br />

<br />

<br />

<br />

<br />

<br />

<br />

8.1.4 Spread (cuidado:AudioSpread) DP.i_spread_v<br />

<br />

<br />

<br />

=<br />

−<br />

<br />

<br />

<br />

<br />

≥<br />

23/04/04 19/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

9 Various <strong>features</strong><br />

9.1 Spectral Flatness/Crest measure (mpeg7:AudioSpectrumFlatness) DP.sfm_m<br />

<br />

•<br />

•<br />

•<br />

•<br />

<br />

&<br />

<br />

<br />

<br />

<br />

&<br />

=<br />

=<br />

8 & , #<br />

= ⋅ <br />

<br />

=<br />

* $<br />

£ ¤ ¥<br />

£<br />

<br />

<br />

<br />

∈<br />

©¨<br />

∏<br />

<br />

∈<br />

<br />

<br />

¦¨§¨©¨<br />

©¨¨<br />

<br />

<br />

¦¨§¨©¨<br />

¢<br />

¡<br />

<br />

( ( ∈ ) )<br />

¨<br />

<br />

−<br />

<br />

<br />

<br />

¨<br />

∈<br />

<br />

<br />

¨¨ ¨¨¨<br />

<br />

<br />

<br />

<br />

<br />

( − μ)<br />

<br />

<br />

<br />

=<br />

23/04/04 20/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

10 Temporal modeling<br />

10.1.1 Mean<br />

<br />

=<br />

¡¨¢©¤§¦ <br />

¡£¢¥¤§¦<br />

10.1.2 Variance<br />

( )<br />

£ <br />

<br />

= <br />

−<br />

10.1.3 Deviation<br />

¨ © © ©<br />

<br />

© £<br />

=<br />

+ <br />

+<br />

<br />

<br />

<br />

( + − )<br />

©<br />

10.1.4 Temporal modeling an mpeg-7 <strong>audio</strong> scalable series<br />

mpeg7::scalableseries. weight<br />

scalableseries AudioLoudnessType<br />

mpeg7:scalableseries numOfElements=1 element name<br />

Element Name Mpeg-7<br />

yes<br />

Mean<br />

Yes<br />

Variance<br />

Extension<br />

Derivative<br />

extension<br />

Modulation<br />

23/04/04 21/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

©©<br />

¢ ©¢©¨¢<br />

¢¡¤£¦¥¨§©¡¥§©¡©£<br />

¥¨©¢£<br />

£¦¢<br />

©©§©<br />

¡ ¤<br />

©¡¥¨§¥ ¡<br />

©¤<br />

23/04/04 22/25<br />

§¡<br />

§©¢ §©¡<br />

¤©¢ §¥ <br />

¤ §¥ ¡<br />

<br />

¢ ¢©©¢<br />

¢ ¢©© ¢<br />

¢ ¢©©¢ ¢ ¢¢<br />

¢ ¢©© <br />

' = # $ ! * " #


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

11 List <strong>of</strong> all descriptors<br />

LLD List<br />

frame<br />

based<br />

number <strong>of</strong><br />

<strong>features</strong> acronym xml tag<br />

Temporal Features<br />

Global Temporal Features<br />

Log Attack Time n 1 DTg_lat mpeg7:LogAttackTime<br />

Temporal Increase n 1 DTg_incr cuidado:TemporalIncrease<br />

Temporal Decrease n 1 DTg_decr cuidado:TemporalDecrease<br />

Temporal Centroid n 1 DTg_tc mpeg7:TemporalCentroid<br />

Effective Duration n 1 DTg_ed cuidado::TemporalEffectiveDuration<br />

Instantaneous Temporal Features<br />

Signal Auto-correlation function y 12 DTi_xcorr_m cuidado:AudioXcorr<br />

Zero-corssing rate y 1 DTi_zcr cuidado:AudioZcr<br />

Energy Features<br />

Total energy y 1 DEi_tot_v mpeg7:AudioPower<br />

Total energy Modulation (frequency, amplitude) n 2 DTg_mod_fr, DTg_mod_am ScalableSeriesType element name="Modulation"<br />

Total harmonic energy y 1 DEi_harmo_v cuidado:AudioHarmonicPower<br />

Total noise energy y 1 DEi_noise_v cuidado:AudioNoisePower<br />

Spectral Features<br />

Spectral Shape<br />

Spectral centroid y 6 DSi_sc_m mpeg7:AudioSpectrumCentroid (mpeg7:SpectralCentroid)<br />

Spectral spread y 6 DSi_ss_m mpeg7:AudioSpectrumSpread<br />

Spectral skewness y 6 Dsi_skew_m cuidado:AudioSpectrumSkewness<br />

Spectral kurtosis y 6 Dsi_kurto_v cuidado:AudioSpectrumKurtosis<br />

Spectral slope y 6 Dsi_slope_v cuidado:AudioSpectrumSlope<br />

Spectral decrease y 1 Dsi_decs_c cuidado:AudioSpectrumDecrease<br />

Spectral roll<strong>of</strong>f y 1 Dsi_roll<strong>of</strong>f_v cuidado:AudioSpectrumRollOff<br />

Spectral variation y 3 Dsi_variation_v cuidado:AudioSpectrumVariation<br />

Global spectral shape <strong>description</strong><br />

MFCC y 12 DPi_mfcc_m cuidado:AudioMFCC<br />

Delta MFCC y (post) 12 DPi_Dmfcc_m<br />

Delta Delta MFCC y (post) 12 DPi_DDmfcc_m<br />

Harmonic Features<br />

Fundamental frequency y 1 DHi_f0_v mpeg7:AudioFundamentalFrequency<br />

Fundamental fr. Modulation (frequency, amplitude) n 2 F0 Mod AM, FR ScalableSeriesType element name="Modulation"<br />

Noisiness y 1 DHi_noisiness_v mpeg7:AudioHarmonicity<br />

Inharmonicity y 1 DHi_inharmo_v cuidado:AudioInharmonicity<br />

Harmonic Spectral Deviation y 3 DHi_devs_v mpeg7:HarmonicSpectralDeviation<br />

Odd to Even Harmonic Ratio y 3 Dhi_oeratio_v cuidado:HarmonicSpectralOERatio<br />

Harmonic Tristimulus y 9 Dhi_tri_v cuidado:HarmonicSpectralTristimulus<br />

Harmonic Spectral Shape<br />

HarmonicSpectral centroid y 6 DHi_sc_m mpeg7:HarmonicSpectralCentroid<br />

HarmonicSpectral spread y 6 DHi_ss_m mpeg7:HarmonicSpectralSpread<br />

HarmonicSpectral skewness y 6 DHi_skew_m cuidado:HarmonicSpectralSkewness<br />

HarmonicSpectral kurtosis y 6 DHi_kurto_v cuidado:HarmonicSpectralKurtosis<br />

HarmonicSpectral slope y 6 DHi_slope_v cuidado:HarmonicSpectralSlope<br />

HarmonicSpectral decrease y 1 DHi_decs_c cuidado:HarmonicSpectralDecrease<br />

HarmonicSpectral roll<strong>of</strong>f y 1 DHi_roll<strong>of</strong>f_v cuidado:HarmonicSpectralRollOff<br />

HarmonicSpectral variation y 3 DHi_variation_v mpeg7:HarmonicSpectralVariation<br />

Perceptual Features<br />

Loudness y 1 DPi_loud_v AudioLoudness<br />

RelaitveSpecific Loudness y 24 DPi_specloud_m cuidado:AudioRelativeSpecificLoudness<br />

Sharpness y 1 DPi_sharp_v cuidado:AudioSharpness<br />

Spread y 1 DPi_spread_v cuidado:AudioSpread<br />

Perceptual Spectral Envelope Shape<br />

Perceptual Spectral centroid y 6 DPi_sc_m cuidado:AudioFilterbankCentroid<br />

Perceptual Spectral spread y 6 DPi_ss_m cuidado:AudioFilterbankSpread<br />

Perceptual Spectral skewness y 6 DPi_skew_m cuidado:AudioFilterbandSkewness<br />

Perceptual Spectral kurtosis y 6 DPi_kurto_v cuidado:AudioFilterbankKurtosis<br />

Perceptual Spectral Slope y 6 DPi_slope_v cuidado:AudioFilterbankSlope<br />

Perceptual Spectral Decrease y 1 DPi_decs_c cuidado:AudioFilterbankDecrease<br />

Perceptual Spectral Roll<strong>of</strong>f y 1 DPi_roll<strong>of</strong>f_v cuidado:AudioFilterbankRoll<strong>of</strong>f<br />

Perceptual Spectral Variation y 3 DPi_variation_v cuidado:AudioFilterbankVariation<br />

Odd to Even Band Ratio y 3 DP_ioeratio_v cuidado:AudioFilterbankOERatio<br />

Band Spectral Deviation y 3 DPi_devs_v cuidado:AudioFilterbankDeviation<br />

Band Tristimulus y 9 DPi_tri_v cuidado:AudioFilterbankTristimulus<br />

Various <strong>features</strong><br />

Spectral flatness y 4 DPi_sfm_m mpeg7:AudioSpectrumFlatness<br />

Spectral crest y 4 DPi_scm_m cuidado:AudioSpectrumCrest<br />

Total Number <strong>of</strong> Features 166<br />

23/04/04 23/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

12 Acknowledgement<br />

13 References<br />

£ ¤£¦¥¨§©¥¨ ¢£ ¨ ¡ £ ¦¡ £ ¥ £ £ ¦ ¦£ £¤£¨¨¡¦ £ £ ¤£¡¤¥¥ ¡ £ ¦¥¨£¥ ¢£ ¡¨ £ ¥ £ £ £<br />

¨£¢¡<br />

¤£¦ ¤¦¥<br />

<br />

£ £¦¥¢§©¢¥¢§£ ¡¨ ¡ £ ¥ £ £ £¥ £ ¨ ¡ £ ¦ £ ¡¤¨ £ ¨¡¤£ £ £ £ ¡ ¥ ¤¡ ¨¤£¡¤¥¥ ¡ £ <br />

¨£¢¡<br />

¥¡¨¡¤¥ £ <br />

<br />

= 0 §¢©¢©¢¨©¥<br />

¡ ¡¥ ¥ ¤¨ ¡¤¥¢§©¢¢¥¨ £ £¥ ¤¨ ¢¥©£¨ ¡£ £ £ ¦¥¡¤ £ ¦¡ £ ¤¨£ £¡¥<br />

<br />

§ ££ ¡ £ © ¥<br />

¢<br />

£ ¥§©¢¢¥ ¤ £ £ ¡ ¥¨¡ £ ¤¡ ¥£ £ ¡ £ ¡¡¤ £ £ £ £¨¡¤¥ ¨ © £ ¡¤¨ ¤¨ ¢ ¥<br />

£ ¥ £ ¥ £¡§©¢¢¥ £ ¡ £ ¥¨¡¡ £ ¤¡¤ £ £ £ ¨¤ £ £ ¦ ¦ ¦¡ ¥££ ¥¤¨ £ £ <br />

<br />

¤¥ £ £ ¦¥ ¢ § ££ ¡ £ ¦¥<br />

<br />

§£ £ ¡ £ ¡ ¡¤¨ ¡ £¥ £ £¥ ¡ £¨¡ £ ¨¥ ¦£ ¡ ¢¤¢ <br />

££¡£¦¥§©¢¢¥<br />

£ £¥ ¢ ¥<br />

¨£¤¡¤¡<br />

£¡ £ ¡ £ ¥§¢¨©¥ ¥¡ ¦ ¨¡ £¦¡¥£ ¤¨ ¤¨¤¡£ £ ¨ ¦£ ¥<br />

¥£¥¥ £¦¥ ¥¢ ¡¤ ¥¢§©¢¥¢§¤¨¤¤¨ £ £ ¨¡ ¦¡¤¢£ £ ¤£ ¡¡¤¥ ¢¤¢ £ ¡ £¦ ¡ ¡¤<br />

¢¨<br />

£ ¥ £ £ ¥£ ¥ ¦¡¥ ££¨ £ ¡ ¥ ¡<br />

<br />

/ ¢¢¢¨¥<br />

¤¨ £ ¥ £ ¥ §©¥ ©¨¥ ¢ £ ¨ ¡ £ ¡ £ ¥ £ £ ¤¡¨ £ §¨¡¤£ £ £ £ ¨£¥¥©¢¦¥¡¤¡¤¥ ¥£¥<br />

<br />

¡¤¨ ¥ £¥<br />

¤¨ ¦ ¥ ¥ ¦¥ ¡¤¥§©¢¨¥ ¤ £ £ £¥ ¡ £ £ £ £ ¤¡ £¦¡¥£¨ ¡¤¨¡ ¦ ¤¨ ¡<br />

<br />

£ £ ¦ ¨¡¤¦¥©¥¡¤¡¤¥ ¥£¥£¥ ¡¤¨ ©¥ ¢£ ¥§£ £ ¥£ £ ¤£¦ ¤¡¡ ¥ £ £ £ © ¥<br />

<br />

¦ ¥¢§©¢¢¥¢ £¦¡¥£¨¢¦¥¡¨¡ ¤ £ £ £¥ ¥¦¨¡¤¥££ £ ¡¤ £ ¤¨ £ £ ¡¤¥ £¥ £<br />

££¨¡ ¡¤¨¡¤<br />

¥¢£¥ ¢¢¢¥<br />

£<br />

/ 0<br />

§¢¢¥ ¢£ ¥£¨ £ £ ¡¥ £ ££ ¡ §£ £ ¡ £ ¡¤¨ £ £ ¢£ ¡¨¥¤¡¢ ¤¨ £¥ ¢ ¢ §£ §<br />

<br />

§¥ ¢ ¢ § ¢ ©¢ ¢¥<br />

©<br />

¥§¢¥ £ § ¥ £ £ £¥ ¨¤¦¡ ¢£ ¨ ¡ £ ¤¡¤ £ ¡¤¨¤¨¥ § ¥ ¡¨¡ ¥<br />

¡¤¡¤¡¤¨<br />

¡¨ £ ¤ £ £ ¥ ©©¦¥ §£ £ ¡ £ £ £ ¢¡¡¢£¨© ¥<br />

¢¤£<br />

¥ ¥ ¡¤ ¥§¢¢¥ ¢£ ¨¤ ¡ £ £ £ ¡¤¨ £ £ £ ¦¥¡ ¤£ £ ¡£¥ ¥ ¢ § § ¡¤¨ ¦£ <br />

¡¤¡¤¡¤¨<br />

£ ¥ ¢¡¤¨<br />

¥ £ ¥ £¦¡¤§¢¢¥ £ ¤ ¡¡¤ £ £ ¦¡¤¨ £¨ ¥£¨£ £ §¨ ¥ £ £ ¥ ¢ § § ¢£¡£¨<br />

¡¤¡¤¡¤¨<br />

£ ¥ ¡¢¡¦¡<br />

¥ £ ¥£¦¡¤§¢¢¥ ¡¤¨¥ ¦ £ ¨¡¤¡¡ ¥ ¢£ ¡¨ £ ¦ ¦ ¤ £ £ ¥£¨¢¦¥¡ §¦ ¥ £ £ £¥<br />

¡¤¡¤¡¤¨<br />

¨¦¡ ¢£ ¨¤ ¡ £ ¡¥ £ £ ¦£ £ ©¥<br />

<br />

£ ¡¤¨¤¥ £ ¥£ £ §©¢¢¥ £ ¡ £ £¥ ¡¡¥¨¡£ £ £ £ ¥¢¡¡ £¨¨¡ £ ¤¡ ¥<br />

¥¡ ¨¡¤¨¤¤¥ £ ¥ £ ¡¤§©¢¢¥ §£ £ ¨¤ £ £ £ ¡ £ £ £¥¨£¦ ¥¡¨¡ ¡¤¡¥ ¨ £ £¨¥<br />

<br />

§ £ ¥ ¢¡¨ £ ¥<br />

¢<br />

23/04/04 24/25


G. Peeters A Large Set <strong>of</strong> Audio Features <strong>for</strong> Sound Description 2004<br />

£ £¦¥ £ £ §©¢¨¥£ £ ¨ £ ¥©£¨ £ £ ¡ £ £ ¥ ¡ ¡ ¨ ¦¡¦¥ ¨¤¡£ £ <br />

¡¤¨¨¥<br />

£ ¥ §<br />

£¤¥ ¤¥ ¡ ¥§©¢¥ § ¥ £ £ ¡¨¥ £ ¨¡¤¨ ¡ £¥ £¥ §§ £ £¦£ £¥ ¡ <br />

<br />

£ ¥ ¥ ¨¥¥ ££ £ §§ ¨¡¤ §£ ¢¢¢¥<br />

$<br />

¥£¦¡¤¥¤¥§¢©¥¨£¡¤ ¨ £ ¡¤¡¤¨ ¡¨¡ £ ¡ ¥¤ ¤¨ ¢ ¨ ¥<br />

¡ ¡ ¡¤¨¤¥§©¢¢¥¨£¡¦¨¡ ¥£¨¤ £ £ £ ¡ £¥¡ £¨ ¨ ¡ £ £ ¦¦¥ £ ¥<br />

¡ ¡ ¡¤¨¤¥§©¢¢¥¥£ £ ¤¥ ¡¨ ¦£ ¨ £ ¡¨¢¡¨¥<br />

¡ ¡¤¨¤¥ £ ¤¥¡¤¨¤¥¤¨§©¨¢¥ £ ¡ ¨¡¤ £ £ ¥£¨¤¨ £ ¨¤¡ £ ¨ £ ¢¡ ¦¥¢¥ £ ¤ £ £ £¥<br />

¡<br />

£ ¥ £ <br />

¥¨¡¡<br />

3 : ©¢¢©¢¥<br />

23/04/04 25/25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!