21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

entire labeled dataset is split into training <strong>and</strong> testing subsets, <strong>and</strong> an average performance can be<br />

evaluated with multiple r<strong>and</strong>omized hold-out tests with the same train/test split ratio. Cross<br />

validation (CV) is a simple heuristic evaluation. In the setting of m-fold cross validation, a<br />

training set is r<strong>and</strong>omly or strategically divided into m disjoint subsets (folds) of equal size. The<br />

classifier is trained m times, each time with a different fold held out as the testing set. An average<br />

performance on the m runs can be calculated <strong>and</strong> evaluated. m = 3,5,10 are popular choices in<br />

MIR studies. For example, the AMC task in MIREX 2007 adopted a 3-fold cross validation. This<br />

dissertation research uses 10-fold cross validation.<br />

In comparing system performances, Friedman’s ANOVA will be applied to determine<br />

whether there are significant differences between the systems considered in each research<br />

question. Friedman’s ANOVA is a non-parametric test which does not require normal<br />

distribution of the sample data, <strong>and</strong> accuracy data are rarely distributed normally (Downie,<br />

2008). The samples used in the tests will be accuracies on individual <strong>mood</strong> categories, unless<br />

otherwise indicated.<br />

4.2 CLASSIFICATION ALGORITHM AND IMPLEMENTATION<br />

4.2.1 Supervised Learning <strong>and</strong> Support Vector Machines<br />

A number of supervised learning algorithms have been invented <strong>and</strong> extensively adopted in<br />

both automatic text categorization <strong>and</strong> <strong>music</strong> <strong>classification</strong>. Supervised learning is a technique<br />

that calculates a <strong>classification</strong> function or model from training data <strong>and</strong> then uses the function or<br />

model to classify new <strong>and</strong> unseen data. Common supervised learning algorithms include decision<br />

trees such as Quinlan’s ID3 <strong>and</strong> C4.5, K-Nearest Neighbors (KNN), Naïve Bayesian algorithm,<br />

49

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!