05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix A. Evaluation Metrics and Notation 182<br />

from a linguistic point <strong>of</strong> view. This is the motivation behind unlabelled dependency<br />

accuracy (Dep), the evaluation metric proposed by Lin (1995), which is based on<br />

comparing dependency tuples in parse and gold trees instead <strong>of</strong> labelled constituents<br />

and phrase boundaries. A sentence is represented in terms <strong>of</strong> a dependency tree where<br />

each word (apart from the head word <strong>of</strong> the sentence) is the modifier (M → D) <strong>of</strong> an-<br />

other word (its head, or H) based on a grammatical relationship. Therefore, each fully<br />

parsed sentence and its gold standard tree are made up <strong>of</strong> N − 1 dependency tuples,<br />

where N is the number <strong>of</strong> words in the sentence. Dependency accuracy is calculated<br />

based on the number <strong>of</strong> dependents in the sentence that are assigned the same head as<br />

in the gold standard (H(M → D)correct) as:<br />

Dep =<br />

H(M → D)correct<br />

N − 1<br />

(A.11)<br />

It is unlabelled, as the type <strong>of</strong> relation between the modifier and its head is not<br />

considered during evaluation. In order to perform dependency-based evaluation, the<br />

constituency trees that are output by the statistical parser must be converted into de-<br />

pendency trees. The conversion algorithm for this procedure is described in detail in<br />

Lin (1995).<br />

A.3.3 Bracketing Scores<br />

Parsing performance is also evaluated in terms <strong>of</strong> average crossing brackets (AvgCB),<br />

zero crossing brackets (0CB) and two or less crossing brackets (≤2CB). AvgCB is<br />

the average number <strong>of</strong> constituents in a parse tree that cross the constituent bound-<br />

aries <strong>of</strong> the gold tree, e.g. ((W1 W2) W3) versus (W1 (W2 W3)). 0CB is the percentage<br />

<strong>of</strong> sentences for which constituents are non-crossing and ≤2CB is the proportion <strong>of</strong><br />

sentences whose constituents cross twice or less with those <strong>of</strong> the gold parse tree.<br />

A.4 Statistical Tests<br />

When comparing the performance <strong>of</strong> the English inclusion classifier to that <strong>of</strong> another<br />

system, or to the baseline, Pearson’s chi-square (χ 2 ) test is used for determining sta-<br />

tistical significance/insignificance in the difference.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!