21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the ultimate judge. Thus ground truth datasets in existing experiments were mostly built by<br />

recruiting human assessors to manually label <strong>music</strong> pieces, <strong>and</strong> then selecting pieces with (near)<br />

consensus on human judgments. However, judgments on <strong>music</strong> are very subjective, <strong>and</strong> it is hard<br />

to achieve agreements across assessors (Skowronek, McKinney, & van de Par, 2006). This has<br />

seriously limited the sizes of experimental datasets <strong>and</strong> necessary validation on inter-assessor<br />

credibility. As a result, experimental datasets usually consist of merely several hundreds of<br />

<strong>music</strong> pieces, <strong>and</strong> each piece is judged by at most three human assessors, <strong>and</strong> in many cases, by<br />

only one assessor (e.g., Trohidis, Tsoumakas, Kalliris, & Vlahavas, 2008; Li & Ogihara, 2003;<br />

Lu, Liu, & Zhang, 2006).<br />

The situation is worsened by the intellectual property regulations on <strong>music</strong> materials, which<br />

effectively prevents sharing ground truth datasets with <strong>audio</strong> content among MIR researchers<br />

affiliated with different institutions. Therefore, it is clear that to enhance the development <strong>and</strong><br />

evaluation in <strong>music</strong> <strong>mood</strong> <strong>classification</strong>, <strong>and</strong> in MIR research in general, a sound method is<br />

much in need to build ground truth sets of reliable quality in an efficient manner.<br />

1.2.3 Multi-modal Classification<br />

Until recent years, MIR systems have focused on single-modal representation of <strong>music</strong>,<br />

mostly on <strong>audio</strong> content <strong>and</strong> some on symbolic scores. The seminal work of Aucouturier <strong>and</strong><br />

Pachet (2004) pointed out that there appeared to be a “glass ceiling” in <strong>audio</strong>-based MIR, due to<br />

the fact that some high-level <strong>music</strong> features with semantic meanings might be too difficult to be<br />

derived from <strong>audio</strong> <strong>using</strong> current technology. Hence, researchers started paying attention to<br />

multi-modal <strong>classification</strong> systems that combine <strong>audio</strong> <strong>and</strong> text (e.g., Neumayer & Rauber, 2007;<br />

Aucouturier, Pachet, Roy, & Beurivé, 2007; Dhanaraj & Logan, 2005; Muller, Kurth, Damm,<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!