21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

as feature values respectively. The model of tfidf weighting uses the product of term frequency<br />

<strong>and</strong> inverted document frequency as feature values.<br />

The name “bag-of-words” simply means a collection of unordered terms, <strong>and</strong> the terms<br />

could be single words (also called “unigram”), POS <strong>tags</strong>, or ordered combinations of multiple<br />

words (also called “n-gram”). In this study, unigrams, bigrams <strong>and</strong> trigrams of the above features<br />

<strong>and</strong> representation models are all evaluated. For each n-gram feature type, features that occurred<br />

less than five times in the training dataset were discarded. In addition, for bigrams <strong>and</strong> trigrams<br />

of Content <strong>and</strong> Cont-stem, function words were not eliminated because content words are usually<br />

connected via function words as in “I love you,” where “I” <strong>and</strong> “you” are function words. For<br />

Cont-stem, words were stemmed before bigrams <strong>and</strong> trigrams were calculated. That is, every<br />

word in a bigram or trigram was stemmed.<br />

Theoretically, high order n-grams can capture features of phrases <strong>and</strong> compound words. A<br />

previous study on lyric <strong>mood</strong> <strong>classification</strong> (He et al., 2008) found the combination of unigrams,<br />

bigrams <strong>and</strong> trigrams yielded the best results among all n-gram features (n

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!