PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Parsing English Inclusions 133<br />
Phrase bracket (PB) frequency BL IE<br />
PBPRED > PBGOLD 62% 51%<br />
PBPRED < PBGOLD 11% 13%<br />
PBPRED = PBGOLD 27% 36%<br />
Table 5.6: Bracket frequency <strong>of</strong> the predicted baseline (BL) and the inclusion entity (IE)<br />
model output compared to the gold standard.<br />
This analysis suggests that the annotation guidelines on foreign inclusions could be<br />
improved when differentiating between phrase categories containing foreign material.<br />
Despite the few inconsistencies and annotation errors discussed here, the large major-<br />
ity <strong>of</strong> English inclusions is consistently annotated as either PN or CH phrase. In the<br />
following, the errors in the parsing output <strong>of</strong> the inclusion set are examined in detail.<br />
5.3.5.2 Phrase Bracketing<br />
Table 5.6 summarises the number <strong>of</strong> phrase bracketing errors made for the inclusion<br />
set. For the majority <strong>of</strong> sentences (62%), the baseline model predicts more brackets<br />
than are present in the gold standard parse tree. This number decreases by 11% to<br />
51% when parsing with the inclusion entity model. The baseline parser predicts fewer<br />
phrase brackets in the output compared to the gold standard in only 11% <strong>of</strong> sentences.<br />
This number slightly increases to 13% for the inclusion entity model. Generally, these<br />
numbers suggest that the baseline parser does not recognise English inclusions as con-<br />
stituents and instead parses their individual tokens as separate phrases. Provided with<br />
additional information <strong>of</strong> multi-word English inclusions in the training data, the parser<br />
is able to overcome this problem. This conclusion is further substantiated in the next<br />
section which examines parsing errors specifically caused by English inclusions.<br />
5.3.5.3 Parsing Errors<br />
In order to understand the parser’s treatment <strong>of</strong> English inclusions, each parse tree is<br />
analysed as to how accurate the baseline and inclusion entity models are at predict-<br />
ing both phrase bracketing and phrase categories (see Table 5.7). For 46 inclusions<br />
(42.2%), the baseline parser makes an error with a negative effect on performance.