21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5: BUILDING A DATASET WITH TERNARY<br />

INFORMATION<br />

The experiments described in Chapter 4 need to be conducted against a ground truth dataset<br />

with ternary information sources available: <strong>audio</strong>, <strong>lyrics</strong> <strong>and</strong> <strong>social</strong> <strong>tags</strong>. Audio <strong>and</strong> <strong>lyrics</strong> are<br />

used to build the classifiers, while <strong>social</strong> <strong>tags</strong> are used for giving ground truth labels to examples<br />

in the dataset. This chapter describes the process of collecting <strong>and</strong> preprocessing the data with<br />

ternary information sources, as well as the process of building the ground truth dataset with<br />

<strong>mood</strong> labels given by <strong>social</strong> <strong>tags</strong>.<br />

5.1 DATA COLLECTION<br />

5.1.1 Audio Data<br />

Audio is the most difficult to obtain among all the three information sources, due to<br />

intellectual property <strong>and</strong> copyright laws imposed on <strong>music</strong> materials. For this reason, data<br />

collection for this research started from <strong>audio</strong> data accessible to the author. The author is<br />

affiliated with the International Music Information Retrieval Systems Evaluation Laboratory<br />

(IMIRSEL) where this dissertation research is conducted. The IMIRSEL is the host of MIREX<br />

each year, <strong>and</strong> has accumulated multiple <strong>audio</strong> collections of significant sizes <strong>and</strong> diversity (see<br />

Table 5.1). The <strong>audio</strong> data in this dissertation research were selected from the IMIRSEL<br />

collections.<br />

53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!