05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Parsing English Inclusions 132<br />

5.3.5.1 Gold Standard Phrase Categories<br />

Table 5.5 lists the different types <strong>of</strong> phrase categories surrounding the 109 multi-word<br />

English inclusions in the error analysis sample and their frequency. 6 The last column<br />

lists a typical example for each category. The figures illustrate that the majority <strong>of</strong><br />

multi-word English inclusions are contained in a proper noun (PN) phrase, including<br />

names <strong>of</strong> companies, political parties, organisations, films, books, newspapers, etc.<br />

The components <strong>of</strong> PN phrases tend to be marked with the grammatical function PNC,<br />

proper noun component (Brants et al., 2002). A less frequent phrasal category <strong>of</strong> En-<br />

glish inclusions is chunk (CH) which tends to be used for slogans, quotes or expressions<br />

like Made in Germany. The components <strong>of</strong> CH phrases are annotated with a grammat-<br />

ical function <strong>of</strong> type UC (unit component). Even in this small sample, phrase category<br />

annotations <strong>of</strong> English inclusions as either PN or CH, and not the other, can be mislead-<br />

ing. For example, the organisation Friends <strong>of</strong> the Earth is annotated as PN, whereas<br />

another organisation International Union for the Conservation <strong>of</strong> Nature is marked as<br />

CH in the gold standard. The latter is believed to be an inconsistency in the annotation<br />

and should have been marked as PN as well.<br />

The phrase category <strong>of</strong> an English inclusion with the syntactic function <strong>of</strong> a noun<br />

phrase which is neither a PN nor a CH is annotated as NP (noun phrase). One exam-<br />

ple is Peace Enforcement which is not translated into German and used rather like a<br />

buzzword in a sentence on UN missions. In this case, the POS tag <strong>of</strong> its individual<br />

tokens is NN (noun). The fact that this expression is not German is therefore lost in<br />

the gold standard annotation. Another example <strong>of</strong> an English inclusion NP in the gold<br />

standard is Framingham Heart Study which could arguably be <strong>of</strong> phrase category PN.<br />

Furthermore, the sample contains an example <strong>of</strong> phrase category CH, Shopping Mall,<br />

an English noun phrase. The least frequent type <strong>of</strong> phrase category used for English<br />

inclusions is CNP. In this sample, this category marks a company names made up <strong>of</strong> a<br />

conjunction, for example Botts and Company. The POS tags <strong>of</strong> the coordinated sisters<br />

are NE (named entity) and the English coordinated conjunction and is tagged as KON.<br />

Finally, there are also two cases, where the English inclusion itself is not contained in<br />

a phrase category. One <strong>of</strong> them is Chief Executives which is clearly an NP. These are<br />

believed to be annotation errors.<br />

6 All phrase category (node) labels and grammatical function (edge) labels occurring in the TIGER<br />

treebank annotation are listed and defined in Appendix C.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!