21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

There are two popular approaches in assembling hybrid systems (also called “fusion<br />

methods”). The most straightforward one is feature concatenation where two feature sets are<br />

concatenated <strong>and</strong> the <strong>classification</strong> algorithms run on the combined feature vectors (e.g. Laurier<br />

et al., 2008; Mayer et al., 2008). The other method is often called “late fusion” which combines<br />

the outputs of individual classifiers based on different sources, either by (weighted) averaging<br />

(e.g., Bischoff et al., 2009b; Whitman & Smaragdis, 2002) or by multiplying (e.g., Li & Ogihara,<br />

2004). To answer this research question, both fusion methods are implemented, <strong>and</strong> the<br />

performances of hybrid systems <strong>and</strong> systems based on single sources are compared <strong>using</strong><br />

statistical tests.<br />

1.3.5 Learning Curves <strong>and</strong> Audio Lengths<br />

A learning curve describes the relationship between <strong>classification</strong> performance <strong>and</strong> the<br />

number of training examples. Usually performance increases with the number of training<br />

examples, <strong>and</strong> the point where performance stops increasing indicates the minimum number of<br />

training examples needed for achieving the best performance. In addition to <strong>classification</strong><br />

performances, the learning curve is also an important measure for the effectiveness of a<br />

<strong>classification</strong> system. Therefore, the comparison on learning curves of the hybrid systems <strong>and</strong><br />

single-source-based systems can reveal whether combining <strong>lyrics</strong> <strong>and</strong> <strong>audio</strong> helps reduce the<br />

number of training examples needed for achieving comparable or better performances as <strong>audio</strong>only<br />

or lyric-only systems.<br />

Due to the time complexity of <strong>audio</strong> processing, MIR systems often process the x second<br />

<strong>audio</strong> clips truncated from the middle of the original tracks instead of the complete tracks, where<br />

x often equals 30 or 15. As text processing is much faster than <strong>audio</strong> processing, it is of practical<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!