21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

other. Stylistic features, borrowed from stylometric analysis (Rudman, 1998), included<br />

punctuations, digit counts, POS counts, words per line, unique words per line, unique words<br />

ratio, etc. The experiments showed that stylistic features were the best among individual lyric<br />

feature sets, but a combination of rhyme <strong>and</strong> stylistic features achieved the best performance in<br />

the task of genre <strong>classification</strong>. Both stylistic features alone <strong>and</strong> the combination of rhyme <strong>and</strong><br />

stylistic features performed twice as well as the bag-of-words approach. The authors also<br />

compared results yielded with <strong>and</strong> without stemming, <strong>and</strong> no significant differences were found.<br />

As an unusual example of studies on <strong>music</strong> <strong>mood</strong> <strong>classification</strong>, Li <strong>and</strong> Ogihara (2004)<br />

compared several lyric feature types besides bag-of-words. They also borrowed wisdom in<br />

stylometric analysis <strong>and</strong> used function words, POS statistics <strong>and</strong> orthographic features of lexical<br />

items such as capitalization, word placement, word length, <strong>and</strong> line length.<br />

In predicting hit songs, Dhanaraj <strong>and</strong> Logan (2005) converted <strong>lyrics</strong> of each song to a vector<br />

<strong>using</strong> Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999). In PLSA, each<br />

dimension of the vector represents the likelihood that the song is about a pre-learned topic.<br />

Logan et al. (2004) showed, for the task of artist similarity, the topics learned <strong>using</strong> a <strong>lyrics</strong><br />

corpus were better than those learned from other general corpus such as news.<br />

As shown from previous research on or <strong>using</strong> <strong>lyrics</strong>, bag-of-words features still dominate.<br />

Dimension reduction techniques <strong>and</strong> shallow linguistic features borrowed from stylometric<br />

analysis are also used. In addition, it is noteworthy that very few of the above studies compared<br />

performances of different feature types.<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!