21.01.2014 Views

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

improving music mood classification using lyrics, audio and social tags

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

accuracy =<br />

TP + TN<br />

TP + FN + FP + TN<br />

; precision =<br />

TP<br />

; recall =<br />

TP + FP<br />

TP<br />

TP + FN<br />

F β =<br />

2<br />

( β + 1) * precision * recall<br />

the importance of recall<br />

, β ≥ 0, whereβ<br />

=<br />

.<br />

2<br />

β * precision + recall<br />

the importance of precision<br />

Usually β = 1 giving equal importance to precision <strong>and</strong> recall: F 1 =<br />

2 * precision * recall<br />

.<br />

precision + recall<br />

Accuracy has been extensively adopted in binary <strong>classification</strong> evaluations in text<br />

categorization. In MIR, especially MIREX, accuracy has been commonly reported in evaluating<br />

<strong>classification</strong> tasks. Therefore, accuracy will be used as the <strong>classification</strong> performance measure<br />

in this dissertation research.<br />

In evaluations of multiple categories, a concise <strong>and</strong> reliable measure of average performance<br />

is desirable. There are two approaches to calculating the average performance over all categories:<br />

micro-average <strong>and</strong> macro-average. Micro-average first gets the sums for all four cells in the<br />

contingency table (Table 4.1) across categories before calculating the final performance measure<br />

<strong>using</strong> the above formulas, while macro-average calculates the performance measures for each<br />

category <strong>and</strong> then takes the mean as the final score. Micro-averaging gives equal weight to each<br />

instance <strong>and</strong> therefore tends to be dominated by the classifier’s performance on big categories.<br />

Macro-averaging gives equal weight to each category, regardless of its size. Thus the two<br />

measures may give very different scores. This dissertation research puts equal emphasis on each<br />

<strong>mood</strong> category <strong>and</strong> thus macro-averaged measures are adopted for evaluation <strong>and</strong> comparison.<br />

In terms of splitting data into training <strong>and</strong> testing sets, both multiple r<strong>and</strong>omized hold out<br />

tests <strong>and</strong> cross validation are often used in MIR <strong>classification</strong> evaluations. In a hold-out test the<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!