21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.2.2 Selecting Songs<br />

The next step is to select positive <strong>and</strong> negative examples for each of the 18 categories. The<br />

general idea is that if a song is frequently tagged with a term in a category, it should be selected<br />

as a positive example for that category. On the other h<strong>and</strong>, if a song is never tagged with any<br />

term in a category, but at the same time is heavily tagged with other <strong>tags</strong> (<strong>mood</strong>-related or not),<br />

then it should be taken as a negative example for the category. Therefore, the frequency or count<br />

of the <strong>social</strong> <strong>tags</strong> is crucial for this step.<br />

5.2.2.1 Tag Count on Last.Fm<br />

The last.fm API provides the 100 most popular <strong>tags</strong> applied to each song <strong>and</strong> the number of<br />

times each tag is applied to this song (called “count” thereafter). To date, the API only provides<br />

normalized tag counts instead of real, absolute counts. For each song, the most popular tag gets<br />

count 100, <strong>and</strong> other <strong>tags</strong> get integer numbers between 0 <strong>and</strong> 100 proportional to the count of the<br />

most popular tag. Tags with count 0 are those appearing too few times compared to other <strong>tags</strong><br />

associated to a song.<br />

In selecting songs for these categories, one should avoid songs that are only tagged with a<br />

term by accident or worse, by mistake or mischief. Ideally, one should select songs with high<br />

counts. However, with only the normalized tag counts available, there is no way to calculate the<br />

real, absolute tag counts. Hence, a heuristic is used to ensure a tag is picked up for a song only<br />

when it has been applied to this song for, at the very least, more than once. Only songs satisfying<br />

one of the following conditions were counted as c<strong>and</strong>idate positive songs in a category:<br />

61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!