21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

selected. The 17 text statistic features defined in Table 6.2 are denoted as “TextStats” in Table<br />

6.7. These statistics were kept unchanged in this experiment, because the 17 dimensions of them<br />

were already compact compared to the 134 interjection words <strong>and</strong> punctuations. Since the SVM<br />

is used as the classifier, <strong>and</strong> a previous study (Yu, 2008) suggested feature selection <strong>using</strong> SVM<br />

ranking worked best for SVM classifiers, the punctuation marks <strong>and</strong> interjection words were<br />

ranked according to the feature weights calculated by the SVM classifier. Like all experiments in<br />

this research, the results were averaged across a 10-fold cross validation, <strong>and</strong> the feature ranking<br />

<strong>and</strong> selection was performed only <strong>using</strong> the training data in each fold. The results in Table 6.7<br />

show that many of the interjection words <strong>and</strong> punctuation marks are redundant indeed. And this<br />

is how the 25 TextStyle features in Table 6.2 were determined.<br />

To provide a sense of how the top features distributed across the positive <strong>and</strong> negative<br />

samples of the categories, the distributions for each of the 25 TextStyle features (six interjection<br />

words, two special punctuations <strong>and</strong> 17 text statistics) were plotted. Figure 6.1, Figure 6.2 <strong>and</strong><br />

Figure 6.3 illustrate the distributions of three sample features: “hey,” “!,” <strong>and</strong><br />

“numberOfWordsPerMinute.”<br />

In these figures, the categories are in descending order of the number of songs in each<br />

category. As can be seen in the figures, the positive <strong>and</strong> negative bars for each category generally<br />

have uneven heights. The greater the differences, the more distinguishing power the feature<br />

would have for that category.<br />

85

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!