05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Parsing English Inclusions 133<br />

Phrase bracket (PB) frequency BL IE<br />

PBPRED > PBGOLD 62% 51%<br />

PBPRED < PBGOLD 11% 13%<br />

PBPRED = PBGOLD 27% 36%<br />

Table 5.6: Bracket frequency <strong>of</strong> the predicted baseline (BL) and the inclusion entity (IE)<br />

model output compared to the gold standard.<br />

This analysis suggests that the annotation guidelines on foreign inclusions could be<br />

improved when differentiating between phrase categories containing foreign material.<br />

Despite the few inconsistencies and annotation errors discussed here, the large major-<br />

ity <strong>of</strong> English inclusions is consistently annotated as either PN or CH phrase. In the<br />

following, the errors in the parsing output <strong>of</strong> the inclusion set are examined in detail.<br />

5.3.5.2 Phrase Bracketing<br />

Table 5.6 summarises the number <strong>of</strong> phrase bracketing errors made for the inclusion<br />

set. For the majority <strong>of</strong> sentences (62%), the baseline model predicts more brackets<br />

than are present in the gold standard parse tree. This number decreases by 11% to<br />

51% when parsing with the inclusion entity model. The baseline parser predicts fewer<br />

phrase brackets in the output compared to the gold standard in only 11% <strong>of</strong> sentences.<br />

This number slightly increases to 13% for the inclusion entity model. Generally, these<br />

numbers suggest that the baseline parser does not recognise English inclusions as con-<br />

stituents and instead parses their individual tokens as separate phrases. Provided with<br />

additional information <strong>of</strong> multi-word English inclusions in the training data, the parser<br />

is able to overcome this problem. This conclusion is further substantiated in the next<br />

section which examines parsing errors specifically caused by English inclusions.<br />

5.3.5.3 Parsing Errors<br />

In order to understand the parser’s treatment <strong>of</strong> English inclusions, each parse tree is<br />

analysed as to how accurate the baseline and inclusion entity models are at predict-<br />

ing both phrase bracketing and phrase categories (see Table 5.7). For 46 inclusions<br />

(42.2%), the baseline parser makes an error with a negative effect on performance.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!