improving music mood classification using lyrics, audio and social tags

More documents

Recommendations

Info

far outweigh the number of training instances. Therefore, it is difficult to make broad generalizations about these extremely sparsely represented mood categories. Another angle of comparing the performances is to only consider the bigger mood categories with more stable performances. Statistical tests on performances of these four systems on the nine largest categories from “calm” to “dreamy” show that the late fusion and feature concatenation hybrid systems significantly outperformed the audio-only system at p = 0.002 and p = 0.009 respectively. In addition, the late fusion hybrid system was also significantly better than the lyric-only system at p = 0.047. There was no other statistically significant difference among the systems. 7.4 LYRICS VS. AUDIO ON INDIVIDUAL CATEGORIES Figure 7.3 also shows that lyrics and audio seem to have different advantages across individual mood categories. Based on the system performances, this section investigates the following two questions: 1) For which moods is audio more useful and for which moods are lyrics more useful? and 2) How do lyric features associate with different mood categories? Answers to these questions can help shed light on a profoundly important music perception question: How does the interaction of sound and text establish a music mood? Table 7.4 shows the accuracies of audio and lyric feature types on individual mood categories. Each of the accuracy values was averaged across a 10-fold cross validation. For each lyric feature set, the categories where its accuracies are significantly higher than that of the audio feature set are marked as bold (at p < 0.05). Similarly, for the audio feature set, bold accuracies are those significantly higher than all lyric features (at p < 0.05). 97
Table 7.4 Accuracies of lyric and audio feature types for individual categories Category Content GI GI-lex ANEW Affect-lex TextStyle Audio calm 0.5905 0.5851 0.5804 0.5563 0.5708 0.5039 0.6574 sad 0.6655 0.6218 0.6010 0.5441 0.5836 0.5153 0.6749 glad 0.5627 0.5547 0.5600 0.5635 0.5508 0.5380 0.5882 romantic 0.6866 0.6228 0.6721 0.6027 0.6333 0.5153 0.6188 gleeful 0.5864 0.5763 0.5405 0.5103 0.5443 0.5670 0.6253 gloomy 0.6157 0.5710 0.6124 0.5520 0.5859 0.5468 0.6178 angry 0.7047 0.6362 0.6497 0.6363 0.6849 0.4924 0.5905 mournful 0.6670 0.6344 0.5871 0.6058 0.6615 0.5001 0.6278 dreamy 0.6143 0.5686 0.6264 0.5183 0.6269 0.5645 0.6681 cheerful 0.6226 0.5633 0.5707 0.5955 0.5171 0.5105 0.5133 brooding 0.5261 0.5295 0.5739 0.4985 0.5383 0.5045 0.6019 aggressive 0.7966 0.7178 0.7549 0.6432 0.6746 0.5345 0.6417 anxious 0.6125 0.5375 0.5750 0.5687 0.5875 0.4875 0.4875 confident 0.3917 0.4429 0.4774 0.4190 0.5548 0.5083 0.5417 hopeful 0.5700 0.4975 0.6025 0.5125 0.6350 0.5375 0.4000 earnest 0.6125 0.6500 0.5500 0.6250 0.6000 0.6375 0.5750 cynical 0.7000 0.6792 0.6375 0.4625 0.6667 0.5250 0.6292 exciting 0.5833 0.5500 0.5833 0.4000 0.4667 0.5333 0.3667 AVERAGE 0.6172 0.5855 0.5975 0.5452 0.5935 0.5290 0.5792 The accuracies marked in bold in Table 7.4 demonstrate that lyrics and audio indeed have their respective advantages in different mood categories. Audio features significantly outperformed all lyric feature types in only one mood category: “calm.” However, lyric features achieved significantly better performances than audio in seven divergent categories: “romantic,” “angry,” “cheerful,” “aggressive,” “anxious,” “hopeful,” and “exciting.” The rest of the section presents and analyzes the most influential features of those lyric feature types that outperformed audio features in the seven aforementioned mood categories. Since the classification model used in this research was SVM with a linear kernel, the features were ranked by the same SVM models trained in the classification experiments. 98
Page 1 and 2:
IMPROVING MUSIC MOOD CLASSIFICATION
Page 3 and 4:
concatenation and late fusion (line
Page 5 and 6:
ACKNOWLEDGMENTS I would have never
Page 7 and 8:
TABLE OF CONTENTS LIST OF FIGURES .
Page 9 and 10:
7.2.2 Best Hybrid Method ..........
Page 11 and 12:
LIST OF TABLES Table 3.1 Mood categ
Page 13 and 14:
music itself and creating the socia
Page 15 and 16:
the ultimate judge. Thus ground tru
Page 17 and 18:
The advantages have attracted resea
Page 19 and 20:
Research Question 2: Which type(s)
Page 21 and 22:
There are two popular approaches in
Page 23 and 24:
1.4.1 Contributions to Methodology
Page 25 and 26:
experimental datasets in music mood
Page 27 and 28:
CHAPTER 2: LITERATURE REVIEW 2.1 MO
Page 29 and 30:
hypothesizes that there are at leas
Page 31 and 32:
2.2.3 What We Know about Music Mood
Page 33 and 34:
encoding facial expressions, some o
Page 35 and 36:
environment by real-life users. The
Page 37 and 38:
1) Tempo: also called “rhythmic p
Page 39 and 40:
previous research did, Bischoff et
Page 41 and 42:
other. Stylistic features, borrowed
Page 43 and 44:
2.5 SUMMARY This chapter provided a
Page 45 and 46:
“evaluation” containing 492 wor
Page 47 and 48:
which it is derived. For instance,
Page 49 and 50:
3.3 COMPARISONS TO MUSIC PSYCHOLOGY
Page 51 and 52:
In total, 20 of the 36 derived cate
Page 53 and 54:
in today’s music listening enviro
Page 55 and 56:
3.4 SUMMARY The identified categori
Page 57 and 58: CHAPTER 4: CLASSIFICATION EXPERIMEN
Page 59 and 60: accuracy = TP + TN TP + FN + FP + T
Page 61 and 62: Support Vector Machines (SVM), etc.
Page 63 and 64: 4.3 SUMMARY This chapter described
Page 65 and 66: Table 5.1 Information of audio coll
Page 67 and 68: irrelevant parts such as HTML marku
Page 69 and 70: Starting from the dataset collected
Page 71 and 72: Table 5.3 Mood categories and song
Page 73 and 74: 1) a song has been tagged with one
Page 75 and 76: 5.2.2.4 Negative Samples Figure 5.1
Page 77 and 78: The patterns shown in this figure a
Page 79 and 80: CHAPTER 6: BEST LYRIC FEATURES This
Page 81 and 82: Liu (2006) identified discriminativ
Page 83 and 84: as feature values respectively. The
Page 85 and 86: 6.2.2 Linguistic Lyric Features In
Page 87 and 88: mixed with regard to its usefulness
Page 89 and 90: Table 6.2 Text stylistic features e
Page 91 and 92: 6.3 IMPLEMENTATION The Snowball ste
Page 93 and 94: shows the best combined feature set
Page 95 and 96: 6.4.3 Analysis of Text Stylistic Fe
Page 97 and 98: Figure 6.1 Distributions of “!”
Page 99 and 100: stylistic features appeared to be a
Page 101 and 102: Support Vector Machines (SVM) as it
Page 103 and 104: then compared to the feature concat
Page 105 and 106: According to the average accuracies
Page 107: systems and the audio-only system w
Page 111 and 112: 7.4.2 Top-Ranked Features Based on
Page 113 and 114: Again, these top-ranked features se
Page 115 and 116: However, some might argue that word
Page 117 and 118: Figure 8.1 shows a general trend th
Page 119 and 120: There are other interesting observa
Page 121 and 122: CHAPTER 9: CONCLUSIONS AND FUTURE R
Page 123 and 124: different strengths. Lyric-based sy
Page 125 and 126: close relationship with music mood.
Page 127 and 128: available, it is interesting to exp
Page 129 and 130: Byrd, D., & Crawford, T. (2002). Pr
Page 131 and 132: Hu, X., Sanghvi, V., Vong, B., On,
Page 133 and 134: McEnnis, D., McKay, C., Fujinaga, I
Page 135 and 136: Thayer, R. E. (1989). The Biopsycho
Page 137 and 138: 18. “music fade” at end of a se
Page 139 and 140: 48. Segment annotation plus specifi
show all

improving music mood classification using lyrics, audio and social tags

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?