PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Parsing English Inclusions 124<br />
the hypo<strong>thesis</strong> that inclusions make parsing difficult, and this difficulty arises primarily<br />
because the parser cannot detect inclusions. Therefore, an anticipated upper bound is<br />
to give the parser perfect tagging information. Two further versions interface with the<br />
English inclusion classifier and treat words marked as inclusions differently from na-<br />
tive words. The first version does so on a word-by-word basis. Conversely, the second<br />
version, the inclusion entity approach, attempts to group inclusions even if a grouping<br />
is not posited by phrase structure rules. Each version is now described in detail.<br />
5.3.2.1 Perfect Tagging Model<br />
This model involves allowing the parser to make use <strong>of</strong> perfect tagging information for<br />
all tokens given in the pre-terminal nodes. In the TIGER annotation, pre-terminals in-<br />
clude not only POS tags and but also grammatical function labels. For example, rather<br />
than a pre-terminal node having the category PRELS (personal pronoun), it is given<br />
the category PRELS-OA (accusative personal pronoun) in the gold standard annotation.<br />
When given the POS tags along with the grammatical functions, the perfect tagging<br />
parser may unfairly disambiguate more syntactic information than when simply pro-<br />
vided with perfect POS tags alone. Therefore, to make this model more realistic, the<br />
parser is required to guess the grammatical functions itself, allowing it to, for example,<br />
mistakenly tag an accusative personal pronoun as a nominative, dative or genitive one.<br />
This setup gives the parser access to information about the gold standard POS tags <strong>of</strong><br />
English inclusions along with those <strong>of</strong> all other words, but does not <strong>of</strong>fer any additional<br />
hints about the syntactic structure <strong>of</strong> the sentence as a whole.<br />
5.3.2.2 Word-by-word Model<br />
The two remaining models both take advantage <strong>of</strong> information acquired from the En-<br />
glish inclusion classifier. To interface the classifier with the parser, each inclusion is<br />
simply marked with a special FOM (foreign material) tag. The word-by-word parser<br />
attempts to guess POS tags itself, much like the baseline. However, whenever it en-<br />
counters a FOM tag, it restricts itself to the set <strong>of</strong> POS tags observed for inclusions<br />
during training (the tags listed in Table 5.1). When a FOM is detected, these and only<br />
these POS tags are guessed; all other aspects <strong>of</strong> the parser remain the same.