21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

far outweigh the number of training instances. Therefore, it is difficult to make broad<br />

generalizations about these extremely sparsely represented <strong>mood</strong> categories.<br />

Another angle of comparing the performances is to only consider the bigger <strong>mood</strong> categories<br />

with more stable performances. Statistical tests on performances of these four systems on the<br />

nine largest categories from “calm” to “dreamy” show that the late fusion <strong>and</strong> feature<br />

concatenation hybrid systems significantly outperformed the <strong>audio</strong>-only system at p = 0.002 <strong>and</strong><br />

p = 0.009 respectively. In addition, the late fusion hybrid system was also significantly better<br />

than the lyric-only system at p = 0.047. There was no other statistically significant difference<br />

among the systems.<br />

7.4 LYRICS VS. AUDIO ON INDIVIDUAL CATEGORIES<br />

Figure 7.3 also shows that <strong>lyrics</strong> <strong>and</strong> <strong>audio</strong> seem to have different advantages across<br />

individual <strong>mood</strong> categories. Based on the system performances, this section investigates the<br />

following two questions: 1) For which <strong>mood</strong>s is <strong>audio</strong> more useful <strong>and</strong> for which <strong>mood</strong>s are<br />

<strong>lyrics</strong> more useful? <strong>and</strong> 2) How do lyric features associate with different <strong>mood</strong> categories?<br />

Answers to these questions can help shed light on a profoundly important <strong>music</strong> perception<br />

question: How does the interaction of sound <strong>and</strong> text establish a <strong>music</strong> <strong>mood</strong>?<br />

Table 7.4 shows the accuracies of <strong>audio</strong> <strong>and</strong> lyric feature types on individual <strong>mood</strong><br />

categories. Each of the accuracy values was averaged across a 10-fold cross validation. For each<br />

lyric feature set, the categories where its accuracies are significantly higher than that of the <strong>audio</strong><br />

feature set are marked as bold (at p < 0.05). Similarly, for the <strong>audio</strong> feature set, bold accuracies<br />

are those significantly higher than all lyric features (at p < 0.05).<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!