21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

concatenation <strong>and</strong> late fusion (linear interpolation) of classifiers were examined <strong>and</strong> compared.<br />

Finally, system performances on various numbers of training examples <strong>and</strong> different <strong>audio</strong><br />

lengths were compared. The results indicate: 1) <strong>social</strong> <strong>tags</strong> can help identify <strong>mood</strong> categories<br />

suitable for a real world <strong>music</strong> listening environment; 2) the most useful lyric features are<br />

linguistic features combined with text stylistic features; 3) lyric features outperform <strong>audio</strong><br />

features in terms of averaged accuracy across all considered <strong>mood</strong> categories; 4) systems<br />

combining <strong>lyrics</strong> <strong>and</strong> <strong>audio</strong> outperform <strong>audio</strong>-only <strong>and</strong> lyric-only systems; 5) combining <strong>lyrics</strong><br />

<strong>and</strong> <strong>audio</strong> can reduce the requirement on training data size, both in number of examples <strong>and</strong> in<br />

<strong>audio</strong> length.<br />

Contributions of this research are threefold. On methodology, it improves the state of the art<br />

in <strong>music</strong> <strong>mood</strong> <strong>classification</strong> <strong>and</strong> text affect analysis in the <strong>music</strong> domain. The <strong>mood</strong> categories<br />

identified from empirical <strong>social</strong> <strong>tags</strong> can complement those in theoretical psychology models. In<br />

addition, many of the lyric text features examined in this study have never been formally studied<br />

in the context of <strong>music</strong> <strong>mood</strong> <strong>classification</strong> nor been compared to each other <strong>using</strong> a common<br />

dataset. On evaluation, the ground truth dataset built in this research is large <strong>and</strong> unique with<br />

ternary information available: <strong>audio</strong>, <strong>lyrics</strong> <strong>and</strong> <strong>social</strong> <strong>tags</strong>. Part of the dataset has been made<br />

available to the MIR community through the Music Information Retrieval Evaluation eXchange<br />

(MIREX) 2009 <strong>and</strong> 2010, the community-based evaluation framework. The proposed method of<br />

deriving ground truth from <strong>social</strong> <strong>tags</strong> provides an effective alternative to the expensive human<br />

assessments on <strong>music</strong> <strong>and</strong> thus clears the way to large scale experiments. On application,<br />

findings of this research help build effective <strong>and</strong> efficient <strong>music</strong> <strong>mood</strong> <strong>classification</strong> <strong>and</strong><br />

recommendation systems by optimizing the interaction of <strong>music</strong> <strong>audio</strong> <strong>and</strong> <strong>lyrics</strong>. A prototype of<br />

such systems can be accessed at http://<strong>mood</strong>ydb.com.<br />

iii

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!