21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 8.1 shows a general trend that all system performances increased with more training<br />

data, but the performance of the <strong>audio</strong>-based system increased much more slowly than the other<br />

systems. With 20% training samples, the accuracies of the hybrid <strong>and</strong> the lyric-only systems<br />

were already better than the highest accuracy of the <strong>audio</strong>-only system with all possible amounts<br />

of training data. To achieve similar accuracy, the hybrid system needed about 20% fewer training<br />

examples than the lyric-only system. This validates the hypothesis that combining <strong>lyrics</strong> <strong>and</strong><br />

<strong>audio</strong> can reduce required training examples needed to achieve certain <strong>classification</strong> performance<br />

levels. In addition, the learning curve of the <strong>audio</strong>-only system levels off (i.e., stops increasing)<br />

at 80% training sample size, while the curves of the other two systems never level off. This<br />

indicates the hybrid system <strong>and</strong> lyric-only system may further improve their performances if<br />

given more training examples. It is also worthy of notice that the performances of the lyric-only<br />

<strong>and</strong> <strong>audio</strong>-only systems drop at the points of 40% <strong>and</strong> 70% training examples respectively. This<br />

observation seems to contradict the general trend that performance increases with the amount of<br />

available training data. However, the performance differences between these points <strong>and</strong> their<br />

neighboring points are not statistically significant (at p < 0.05), <strong>and</strong> thus these performance drops<br />

can be seen as r<strong>and</strong>om effects <strong>and</strong> do not form a counter case of the general trend of the learning<br />

curves.<br />

8.2 AUDIO LENGTHS<br />

The second part of research question 5 is about the effect of <strong>audio</strong> lengths on <strong>classification</strong><br />

performance, <strong>and</strong> whether incorporating <strong>lyrics</strong> can reduce the requirement on the length of <strong>audio</strong><br />

data for achieving certain performance levels. This research compares the performances of the<br />

<strong>audio</strong>-based, lyric-based <strong>and</strong> the late fusion hybrid system on datasets with <strong>audio</strong> clips of various<br />

106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!