05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2. Background and Theory 39<br />

S 461<br />

|<br />

+- DS_G 460<br />

|<br />

+- NG_G 6<br />

|<br />

|<br />

+- NG_SIMP_G 5<br />

|<br />

|<br />

+- PRON_G "er" [’?e:ˆ6] 1<br />

|<br />

+- V_G 155<br />

|<br />

|<br />

+- VST_G 153<br />

|<br />

|<br />

|<br />

+- VSIMP_G 152<br />

|<br />

|<br />

|<br />

+- VS_G 151<br />

|<br />

|<br />

|<br />

+- VS_E "surf" [’s3:f] 1<br />

|<br />

|<br />

+- VE_G "t#" [t#] 1<br />

|<br />

+- AN_G 289<br />

|<br />

+- PG_G 288<br />

|<br />

+- PG_SIMP_G 287<br />

|<br />

+- PREPC_G "im" [’?Im] 1<br />

|<br />

+- NGN_G 284<br />

|<br />

+- NG_E 34<br />

|<br />

+- MOD_REP_E 21<br />

|<br />

|<br />

+- MOD_REP_E 11<br />

|<br />

|<br />

|<br />

+- MOD_E 8<br />

|<br />

|<br />

|<br />

+- N_E 4<br />

|<br />

|<br />

|<br />

+- NS_E "world" [’w3:ld] 1<br />

|<br />

|<br />

|<br />

+- NE_E "#" [#] 1<br />

|<br />

|<br />

+- MOD_E 8<br />

|<br />

|<br />

+- ADJ_E 7<br />

|<br />

|<br />

+- AJ_E 4<br />

|<br />

|<br />

+- AS_E "wide" [’wa_Id] 1<br />

|<br />

|<br />

+- ASE_E "#" [#] 1<br />

|<br />

+- N_E 4<br />

|<br />

+- NS_E "web" [’web] 1<br />

|<br />

+- NE_E "#" [#] 1<br />

Figure 2.7: Mixed-lingual analyser output (Pfister and Romsdorfer, 2003)<br />

The interlingual homograph Lager could either refer to the German neuter noun, which<br />

has many different meanings including camp, or the English noun, i.e. a type <strong>of</strong> beer.<br />

Given the context, it is evident that we are dealing with an English inclusion. However,<br />

the analyser would give higher priority to the German variant.<br />

Pfister and Romsdorfer (2003) note that individual language grammars consist <strong>of</strong><br />

more than 500 rules whereas the inclusion grammar contains around 20 rules. While<br />

they state that the morpho-syntactic analyser is precise at detecting the language <strong>of</strong><br />

tokens and the sentence structure, they did not actually evaluate the performance <strong>of</strong><br />

their system. The reason is that the various lexica and grammars used by the rule-<br />

based morpho-syntactic analyser are relatively small and coverage is thus very limited<br />

(correspondence with authors). Results would be largely dominated by the words that<br />

are not covered by the morphological analyser and not measure the performance <strong>of</strong> the<br />

approach. It is therefore unclear how well the analyser performs on real mixed-lingual<br />

data. Although Pfister and Romsdorfer (2003) have taken an interesting approach to<br />

dealing with mixed-lingual data, a system working with larger lexica and grammars<br />

will be very expensive in terms <strong>of</strong> computational overhead and may fail when rules<br />

contradict each other. This method is also very costly considering that linguistic ex-<br />

perts have to write from scratch all the necessary grammars for each language scenario.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!