05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Parsing English Inclusions 124<br />

the hypo<strong>thesis</strong> that inclusions make parsing difficult, and this difficulty arises primarily<br />

because the parser cannot detect inclusions. Therefore, an anticipated upper bound is<br />

to give the parser perfect tagging information. Two further versions interface with the<br />

English inclusion classifier and treat words marked as inclusions differently from na-<br />

tive words. The first version does so on a word-by-word basis. Conversely, the second<br />

version, the inclusion entity approach, attempts to group inclusions even if a grouping<br />

is not posited by phrase structure rules. Each version is now described in detail.<br />

5.3.2.1 Perfect Tagging Model<br />

This model involves allowing the parser to make use <strong>of</strong> perfect tagging information for<br />

all tokens given in the pre-terminal nodes. In the TIGER annotation, pre-terminals in-<br />

clude not only POS tags and but also grammatical function labels. For example, rather<br />

than a pre-terminal node having the category PRELS (personal pronoun), it is given<br />

the category PRELS-OA (accusative personal pronoun) in the gold standard annotation.<br />

When given the POS tags along with the grammatical functions, the perfect tagging<br />

parser may unfairly disambiguate more syntactic information than when simply pro-<br />

vided with perfect POS tags alone. Therefore, to make this model more realistic, the<br />

parser is required to guess the grammatical functions itself, allowing it to, for example,<br />

mistakenly tag an accusative personal pronoun as a nominative, dative or genitive one.<br />

This setup gives the parser access to information about the gold standard POS tags <strong>of</strong><br />

English inclusions along with those <strong>of</strong> all other words, but does not <strong>of</strong>fer any additional<br />

hints about the syntactic structure <strong>of</strong> the sentence as a whole.<br />

5.3.2.2 Word-by-word Model<br />

The two remaining models both take advantage <strong>of</strong> information acquired from the En-<br />

glish inclusion classifier. To interface the classifier with the parser, each inclusion is<br />

simply marked with a special FOM (foreign material) tag. The word-by-word parser<br />

attempts to guess POS tags itself, much like the baseline. However, whenever it en-<br />

counters a FOM tag, it restricts itself to the set <strong>of</strong> POS tags observed for inclusions<br />

during training (the tags listed in Table 5.1). When a FOM is detected, these and only<br />

these POS tags are guessed; all other aspects <strong>of</strong> the parser remain the same.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!