21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

According to the average accuracies, both hybrid systems outperformed single-source-based<br />

systems. The box plots also show that the late fusion system had the least performance variance<br />

across categories among the four systems <strong>and</strong> thus was the most stable system. On the other<br />

h<strong>and</strong>, the hybrid system <strong>using</strong> feature concatenation seemed the least stable.<br />

Table 7.2 presents the average accuracies of these four systems. It shows that the hybrid<br />

system with late fusion improved accuracy over the <strong>audio</strong>-only system by 9.6% <strong>and</strong> 8% for the<br />

top two lyric feature sets respectively. It can also be seen from Table 7.2 that feature<br />

concatenation was not good for combining ANEW + TextStyle lyric feature set <strong>and</strong> <strong>audio</strong>, as the<br />

hybrid system <strong>using</strong> this method performed worse than the lyric-only system (0.629 vs. 0.637).<br />

Table 7.2 Accuracies of single-source-based <strong>and</strong> hybrid systems<br />

Feature set Audio-only Lyric-only Feature concatenation Late fusion<br />

BEST 0.579 0.638 0.645 0.675<br />

ANEW+TextStyle 0.579 0.637 0.629 0.659<br />

The raw difference of 5.9% between the performances of the lyric-only system <strong>and</strong> the<br />

<strong>audio</strong>-only system is noteworthy (Table 7.2). The findings of other researchers (e.g., Laurier et<br />

al., 2008; Mayer et.al, 2008; Yang et.al, 2008; Logan et al., 2004) have never shown lyric-only<br />

systems to outperform <strong>audio</strong>-only system in terms of averaged accuracy across all categories.<br />

The author surmises that this difference could be because of the new lyric features applied in this<br />

study. However, from Table 7.3 which lists the results of pair-wise statistical tests on system<br />

performances for the top two lyric feature sets, the performance difference between the lyriconly<br />

<strong>and</strong> <strong>audio</strong>-only systems was just shy of being accepted as significant (p = 0.054 for the<br />

BEST feature set), <strong>and</strong> thus more work is needed in the future before this claim could be<br />

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!