PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Appendix A. Evaluation Metrics and Notation 182<br />
from a linguistic point <strong>of</strong> view. This is the motivation behind unlabelled dependency<br />
accuracy (Dep), the evaluation metric proposed by Lin (1995), which is based on<br />
comparing dependency tuples in parse and gold trees instead <strong>of</strong> labelled constituents<br />
and phrase boundaries. A sentence is represented in terms <strong>of</strong> a dependency tree where<br />
each word (apart from the head word <strong>of</strong> the sentence) is the modifier (M → D) <strong>of</strong> an-<br />
other word (its head, or H) based on a grammatical relationship. Therefore, each fully<br />
parsed sentence and its gold standard tree are made up <strong>of</strong> N − 1 dependency tuples,<br />
where N is the number <strong>of</strong> words in the sentence. Dependency accuracy is calculated<br />
based on the number <strong>of</strong> dependents in the sentence that are assigned the same head as<br />
in the gold standard (H(M → D)correct) as:<br />
Dep =<br />
H(M → D)correct<br />
N − 1<br />
(A.11)<br />
It is unlabelled, as the type <strong>of</strong> relation between the modifier and its head is not<br />
considered during evaluation. In order to perform dependency-based evaluation, the<br />
constituency trees that are output by the statistical parser must be converted into de-<br />
pendency trees. The conversion algorithm for this procedure is described in detail in<br />
Lin (1995).<br />
A.3.3 Bracketing Scores<br />
Parsing performance is also evaluated in terms <strong>of</strong> average crossing brackets (AvgCB),<br />
zero crossing brackets (0CB) and two or less crossing brackets (≤2CB). AvgCB is<br />
the average number <strong>of</strong> constituents in a parse tree that cross the constituent bound-<br />
aries <strong>of</strong> the gold tree, e.g. ((W1 W2) W3) versus (W1 (W2 W3)). 0CB is the percentage<br />
<strong>of</strong> sentences for which constituents are non-crossing and ≤2CB is the proportion <strong>of</strong><br />
sentences whose constituents cross twice or less with those <strong>of</strong> the gold parse tree.<br />
A.4 Statistical Tests<br />
When comparing the performance <strong>of</strong> the English inclusion classifier to that <strong>of</strong> another<br />
system, or to the baseline, Pearson’s chi-square (χ 2 ) test is used for determining sta-<br />
tistical significance/insignificance in the difference.