04.01.2013 Views

Mining for Preposition- Noun Constructions in German

Mining for Preposition- Noun Constructions in German

Mining for Preposition- Noun Constructions in German

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>M<strong>in</strong><strong>in</strong>g</strong> <strong>for</strong> <strong>Preposition</strong>-<br />

<strong>Noun</strong> <strong>Constructions</strong><br />

<strong>in</strong> <strong>German</strong><br />

Katja Keßelmeier, Tibor Kiss, Antje Müller,<br />

Claudia Roch, Tobias Stadtfeld, Jan Strunk<br />

NODALIDA 2009 – Workshop on Extract<strong>in</strong>g<br />

and Us<strong>in</strong>g <strong>Constructions</strong> <strong>in</strong> NLP<br />

May 14, 2009, Odense, Denmark<br />

Sprachwissenschaftliches Institut


Outl<strong>in</strong>e of the Talk<br />

� <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> <strong>in</strong> <strong>German</strong><br />

� Def<strong>in</strong>ition<br />

� Problem<br />

� Properties<br />

� Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />

� Corpus<br />

� Automatic annotation<br />

� Countability classification<br />

� Identification and extraction<br />

� Evaluation and discussion<br />

� Outlook<br />

� Manual annotation<br />

� Further annotation cycles<br />

� Annotation m<strong>in</strong><strong>in</strong>g<br />

2


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Def<strong>in</strong>ition and Examples<br />

� A <strong>Preposition</strong>-<strong>Noun</strong> Construction (PNC) is a prepositional phrase<br />

with a determ<strong>in</strong>erless nom<strong>in</strong>al complement (that would normally have<br />

to be accompanied by a determ<strong>in</strong>er).<br />

� Sometimes also called „determ<strong>in</strong>erless PP‟<br />

� Examples<br />

auf Anfrage (after be<strong>in</strong>g asked), auf Auf<strong>for</strong>derung (on request),<br />

durch Beobachtung (through observation), <strong>in</strong> Anspielung (allud<strong>in</strong>g to),<br />

mit Vorbehalt (with reservations), unter Androhung (under threat),<br />

ohne Vorwarnung (without warn<strong>in</strong>gs)<br />

3


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Problem<br />

� <strong>Noun</strong>s that usually have to occur with a determ<strong>in</strong>er can<br />

occur without one <strong>in</strong> PNCs.<br />

� S<strong>in</strong>gular count common nouns<br />

An sich habe ich ja den Grundsatz, dass ich nicht auf Anfrage blogge.<br />

‘Normally I have the pr<strong>in</strong>ciple that I do not blog upon request.’<br />

(praegnanz.de/weblog/bloggen-auf-anfrage-e<strong>in</strong>-versuch)<br />

wir werden *(die) Anfrage unverzüglich bearbeiten<br />

‘We will process the request immediately.’<br />

(http://www.golfenweltweit.de/noscan/Anfragen_villa_sp.htm)<br />

4


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Problem<br />

� The miss<strong>in</strong>g determ<strong>in</strong>er <strong>in</strong> PNCs conflicts with descriptive<br />

grammar, generative theory and typological observations<br />

� Rule 442 from the <strong>German</strong> Duden Grammar (Duden 2005, p. 337)<br />

“<strong>Noun</strong>s with the feature comb<strong>in</strong>ation countable plus s<strong>in</strong>gular thus<br />

always have an accompany<strong>in</strong>g determ<strong>in</strong>er with them […]”<br />

� “We assume that common nouns subcategorize <strong>for</strong> their<br />

determ<strong>in</strong>ers […].” (Pollard and Sag 1994, p. 49)<br />

� Languages typically require a s<strong>in</strong>gular count noun to be realized<br />

with a determ<strong>in</strong>er if the language has a determ<strong>in</strong>er system,<br />

a s<strong>in</strong>gular/plural, and a count/mass dist<strong>in</strong>ction.<br />

(Himmelmann 1998)<br />

6


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Properties – Compositionality<br />

� PNCs are often regarded as exceptions, as fixed expressions<br />

that can be listed (cf. Duden 2005, p. 306; Fleischer 1982) or<br />

as complex prepositions (Traw<strong>in</strong>ski 2003; Traw<strong>in</strong>ski et al. 2006)<br />

� But the nouns conta<strong>in</strong>ed <strong>in</strong> PNCs can be modified:<br />

[ PP P [ N‟ ... N ...]]<br />

auf parlamentarische Anfrage (after be<strong>in</strong>g asked <strong>in</strong> parliament),<br />

durch kritische Beobachtung (through critical observation),<br />

mit leisem Vorbehalt (with quiet reservations),<br />

unter sanfter Androhung (under gentle threat)<br />

7


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Properties – Compositionality<br />

Comparison of collocational strength between the preposition unter (under) and s<strong>in</strong>gular<br />

and plural count nouns respectively us<strong>in</strong>g Dunn<strong>in</strong>g„s log likelihood ratio (Kiss 2007)<br />

8


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Properties – Productivity<br />

Corpus<br />

Size<br />

(million<br />

words)<br />

N<br />

(tokens)<br />

V(N)<br />

(types)<br />

V(1,N)<br />

(hapax)<br />

E[V(1,N)]<br />

(exp.<br />

hapax)<br />

P(N)<br />

(productivity)<br />

P(N)<br />

(plural)<br />

6 551 110 74 96.66 0.175 0.451<br />

12 1,086 164 104 123.47 0.114 0.438<br />

26 2,185 267 169 166.41 0.076 0.347<br />

52 4,336 435 282 249.93 0.058 0.288<br />

78 6,343 564 384 332.19 0.052 0.241<br />

107 8,526 684 469 400.43 0.047 0.217<br />

213 16,427 1,086 746 748.81 0.046 0.197<br />

Productivity of unter (under) and s<strong>in</strong>gular and plural count nouns respectively based on<br />

models by Baayen (2001), Evert (2004) and Dömges et al. (2007) (Kiss 2007)<br />

9


<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />

Further Properties<br />

� Productive and compositional, but restricted <strong>in</strong> ill-understood<br />

ways<br />

unter (der) Voraussetzung des E<strong>in</strong>verständnisses des Ensembles<br />

unter *(der) Prämisse des E<strong>in</strong>verständnisses des Ensembles<br />

‘on (the) condition of the approval by the ensemble’<br />

� Can often be replaced by regular PPs without a change <strong>in</strong><br />

mean<strong>in</strong>g<br />

Milosevic unterschrieb auch unter Androhung/der<br />

Milosevic signed even under threatsg / the<br />

Androhung/Androhungen von NATO-Bombardementen nicht.<br />

threat / threatspl of bomb attack-by-NATO not<br />

‘Milosevic did not even sign under pa<strong>in</strong> of NATO bomb attacks.’<br />

� Speakers do not have good <strong>in</strong>tuitions about PNCs and are<br />

reluctant to produce or judge new examples<br />

10


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Corpus<br />

� We use volumes 1993 to1999 of Neue Zürcher Zeitung<br />

(NZZ), a <strong>German</strong>-language newspaper from Zürich.<br />

� This corpus comprises more than 208 million tokens.<br />

� A daily issue conta<strong>in</strong>s about 98,000 tokens on average.<br />

� We are plann<strong>in</strong>g to at least double the size of our corpus<br />

� Volumes 2000 to 2004 of NZZ<br />

� Volumes 1986 to 1999 of taz (taken from the TüPP-D/Z corpus)<br />

11


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Automatic Annotation<br />

� Tokenization<br />

� Sentence boundary detection<br />

(PUNKT system, Kiss and Strunk 2006)<br />

� Heuristic recognition of document structure<br />

� Headl<strong>in</strong>es<br />

� Paragraphs<br />

� Important s<strong>in</strong>ce headl<strong>in</strong>es allow telegraph style with<br />

anomalous lack of determ<strong>in</strong>ers<br />

Kanzler<strong>in</strong> gibt Garantie für Spare<strong>in</strong>lagen<br />

‘Chancelor provides guarantee <strong>for</strong> sav<strong>in</strong>gs’<br />

(http://www.f<strong>in</strong>areo.de/news/kanzler<strong>in</strong>-gibt-garantie-fuer-spare<strong>in</strong>lagen.php)<br />

12


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Automatic Annotation<br />

� Regression Forest Tagger (RFT) (Schmid and Laws 2008)<br />

� Robust POS tagg<strong>in</strong>g with high accuracy<br />

� Morphological analysis and lemmatization based on<br />

SMOR (Schmid et al. 2004)<br />

� Tra<strong>in</strong>ed on complete lexicon extracted from the corpus<br />

� Provides number feature <strong>for</strong> nouns<br />

� Prelim<strong>in</strong>ary evaluation resulted <strong>in</strong> over 97 % accuracy wrt. number<br />

evaluated on a small sample of 500 nouns taken from NZZ 1995<br />

� TreeTagger (Schmid 1995)<br />

� Additional POS annotation<br />

� Non-recursive m<strong>in</strong>imal chunks<br />

� Most importantly preposition chunks (PC)<br />

13


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Aggregation<br />

� We aggregate the output of<br />

RFT and TT <strong>in</strong> one <strong>for</strong>mat<br />

� Inl<strong>in</strong>e XML <strong>for</strong>mat<br />

� And def<strong>in</strong>e global IDs<br />

� Used across all tools and<br />

<strong>for</strong>mats<br />

� Hierarchical<br />

� Human readable<br />

NZZ_1994_04_27_a32_seg5_s13_t4<br />

� We will switch to stand-off<br />

<strong>for</strong>mats<br />

� MMAX2 <strong>for</strong> manual annotation<br />

� PAULA <strong>in</strong> the long run<br />

Name of newspaper NZZ<br />

Year 1994<br />

Month 04<br />

Day 27<br />

Number of article<br />

(<strong>in</strong> daily issue)<br />

Number of segment<br />

(<strong>in</strong> article)<br />

Number of sentence<br />

(<strong>in</strong> segment)<br />

Number of token<br />

(<strong>in</strong> sentence)<br />

32<br />

5<br />

13<br />

4<br />

14


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Inl<strong>in</strong>e XML Format<br />

[…]<br />

[…]<br />

[…]<br />

<br />

unter<br />

Verweis<br />

<br />

<br />

auf<br />

die<br />

laufenden<br />

Ermittlungen<br />

[…]<br />

<br />

15


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />

– Countability Classification<br />

� Build<strong>in</strong>g a tra<strong>in</strong><strong>in</strong>g set<br />

� Manual annotation of 10,000 <strong>German</strong> noun lemmas<br />

<strong>for</strong> their countability preference (Allan 1980)<br />

� Auto (car) � +count<br />

� Wasser (water) � -count<br />

� Per<strong>for</strong>med by four tra<strong>in</strong>ed l<strong>in</strong>guists<br />

� Lemmas that the annotators disagreed on were discarded<br />

� Lemmas that did not show a class-plausible ratio of s<strong>in</strong>gular and<br />

plural occurrences (based on RFT) were also dismissed<br />

� Result: 4,267 classified nouns (74 % +count, 26 % -count)<br />

16


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />

– Countability Classification<br />

� Naïve Bayes Classifier<br />

� Probability of noun be<strong>in</strong>g<br />

+count given its context C<br />

P(+count|C)<br />

� Context: immediately<br />

preced<strong>in</strong>g token<br />

� RFT-POS<br />

� TT-POS<br />

� Lemma<br />

� Contexts are aggregated<br />

to classify noun types<br />

Context (C) +count -count P(C|+count)<br />

PIAT PRO viel 0 1765 0.0005<br />

KOKOM CONJ wie 327 1200 0.2145<br />

VMFIN VFIN sollen 37 237 0.1376<br />

ART ART e<strong>in</strong>en 246 15 0.9391<br />

PIAT PRO ke<strong>in</strong>e 4287 2969 0.5907<br />

… 819 different contexts …<br />

� Classification thresholds<br />

P(+count|C) < 0.35 � -count<br />

P(+count|C) > 0.5 � +count<br />

otherwise � unknown<br />

17


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />

– Countability Classification<br />

� Decision Tree Classifier (J4.8)<br />

� Tra<strong>in</strong>ed on 7,814 noun types<br />

� Us<strong>in</strong>g 10-fold crossvalidation<br />

� Uses s<strong>in</strong>gular/total ratio of noun as sole feature<br />

� Accuracy over 90 %<br />

� S<strong>in</strong>gular/total-ratio 0.98<br />

| S<strong>in</strong>gular/total-ratio 0.997: -count<br />

18


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />

– Countability Classification<br />

� Comb<strong>in</strong>ation of classifiers<br />

� Classifiers have to agree<br />

� Otherwise the noun is classified as unkown<br />

� Evaluated on 100 nouns classified as +count and<br />

100 nouns classified as -count<br />

� Accuracy <strong>for</strong> +count: 93 %<br />

� Accuracy <strong>for</strong> -count: 88 %<br />

� We plan to use bootstrapp<strong>in</strong>g to <strong>in</strong>crease the size of the<br />

tra<strong>in</strong><strong>in</strong>g set<br />

19


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Extraction<br />

� Us<strong>in</strong>g (Open) Corpus Workbench / Corpus Query<br />

Processor (Evert 2005)<br />

� Able to <strong>in</strong>dex and query shallow l<strong>in</strong>guistic annotation<br />

� Can cope with huge amounts of data<br />

� Still could not <strong>in</strong>dex and search NZZ 1993-1999 <strong>in</strong> one go<br />

� Only m<strong>in</strong>imal conversion necessary from <strong>in</strong>l<strong>in</strong>e XML <strong>for</strong>mat<br />

� In<strong>for</strong>mation about tokens � positional attributes<br />

� In<strong>for</strong>mation about larger units � structural attributes<br />

20


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Extraction<br />

� CQP query<br />

<br />

[(word="an" %cd) & (rft_pos="APPR")]<br />

[(rft_pos!="(ART|…)") &<br />

(tt_pos!="(ART|…)")]*<br />

[(tt_pos="NN") & (rft_morph!=".*\.Pl\..*") &<br />

(countability="count")]<br />

<br />

� List of 23 prepositions that typically take noun complements and<br />

assign them case<br />

� an, auf, bei, b<strong>in</strong>nen, dank, durch, für, gegen, gemäß, h<strong>in</strong>ter, <strong>in</strong>, mit, mittels,<br />

nach, neben, ohne, seit, über, um, unter, vor, während, wegen<br />

21


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Evaluation<br />

� Creat<strong>in</strong>g a Gold Standard<br />

� One daily issue of NZZ chosen randomly<br />

� April 3, 1997<br />

� 6,081 sentences and 91,357 tokens<br />

� Extraction of all occurrences of the 23 prepositions<br />

� 5,304 hits<br />

� Based only on their word <strong>for</strong>m<br />

� Manually marked all true PNCs<br />

� 161 true PNCs<br />

22


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Evaluation<br />

� Test set<br />

� Extraction of putative PNCs headed by the 23 prepositions<br />

us<strong>in</strong>g the CQP query and the automatic annotation<br />

� Results<br />

Hits Precision Recall F-Measure<br />

+count 56 48.21 % 16.77 % 24.88 %<br />

+count and unknown 610 23.44 % 88.82 % 37.09 %<br />

� -count as a knockout criterion <strong>in</strong>creases recall more than fivefold<br />

� High recall is more important to us than high precision<br />

23


Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />

<strong>Constructions</strong> – Discussion<br />

� Low recall<br />

� Data sparseness � PNCs are <strong>in</strong>deed productive!<br />

� Very cautious strategy of countability classification result<strong>in</strong>g <strong>in</strong> a low<br />

coverage of the countability classifier (cf. Stadtfeld et al. 2009)<br />

� Be less cautious and <strong>in</strong>crease amount of tra<strong>in</strong><strong>in</strong>g data <strong>for</strong><br />

countability classification<br />

� Less than optimal precision<br />

� Determ<strong>in</strong>erless nouns <strong>in</strong> headl<strong>in</strong>es<br />

� Idiomatic preposition noun comb<strong>in</strong>ations<br />

� B<strong>in</strong>om<strong>in</strong>al constructions such as unter Dach und Fach<br />

under roof and case<br />

‘done and dusted’<br />

� Chunk<strong>in</strong>g errors<br />

� Try to exclude headl<strong>in</strong>es and b<strong>in</strong>om<strong>in</strong>al constructions<br />

by ref<strong>in</strong><strong>in</strong>g CQP query<br />

24


Outlook – Manual Annotation<br />

� Extraction of sentence IDs from the output of CQP<br />

� Aggregation <strong>in</strong> small batches of 50 sentences as<br />

convenient work<strong>in</strong>g packages<br />

� Manual annotation<br />

� Tool: MMAX2 (Müller and Strube 2006)<br />

� Stand-off annotation<br />

� Unlimited number of annotation layers that can be added <strong>in</strong>crementally<br />

� Markables and po<strong>in</strong>ter relations between them<br />

� High customizable<br />

� Levels of manual annotation<br />

� Lexical semantics of noun and preposition<br />

� Morphology complexity and etymological status of the noun<br />

� Valency of the noun<br />

� Metadata<br />

� …<br />

25


Outlook – Manual Annotation –<br />

MMAX2<br />

26


Outlook – Further Annotation Cycles<br />

corpora<br />

count-mass<br />

Naïve Bayes<br />

Weka J48<br />

automatic annotation<br />

Tagg<strong>in</strong>g (RFT, TT)<br />

Lemmatizer<br />

SMOR Inflectional Morphology<br />

Phrasal Chunks (TT)<br />

stand-off annotation<br />

[P … N]<br />

PP, NP<br />

manual annotation<br />

subcategorization<br />

lexical semantics<br />

lexeme properties<br />

analysis<br />

Weka<br />

Pr<strong>in</strong>cipal Component<br />

Analysis<br />

L<strong>in</strong>ear Modell<strong>in</strong>g<br />

PP, NP<br />

[P … N]<br />

27


Conclusion<br />

� Work on relatively rare, l<strong>in</strong>guistically “non-core” and partly irregular<br />

constructions such as PNCs<br />

� Cannot be based on speaker <strong>in</strong>tuitions<br />

� Necessitates the use of large corpora and annotation m<strong>in</strong><strong>in</strong>g<br />

(Chiarcos et al. 2008)<br />

� Use of automatic annotation tools and countability classification<br />

shows great promise <strong>for</strong> this k<strong>in</strong>d of research<br />

� There is still much room <strong>for</strong> improvement <strong>in</strong> the extraction of PNCs<br />

� Many practical problems<br />

� Amount of data too large <strong>for</strong> many tools<br />

� Much of the work consists <strong>in</strong> convert<strong>in</strong>g between data <strong>for</strong>mats<br />

� Difficult to build and ma<strong>in</strong>ta<strong>in</strong> the <strong>in</strong>frastructure <strong>for</strong> such a data-<strong>in</strong>tensive<br />

l<strong>in</strong>guistic research project<br />

28


Acknowledgements<br />

� The research reported here was partially funded by the<br />

Deutsche Forschungsgeme<strong>in</strong>schaft (DFG) under grant<br />

no. KI 759/5-1.<br />

� We would like to thank very much<br />

� Stefanie Dipper<br />

� Helmut Schmid<br />

29


Literature<br />

Keith Allan. 1980. <strong>Noun</strong>s and countability. Language 56(3): 541-567.<br />

R. Harald Baayen. 2001. Word Frequency Distributions. Kluwer, Dordrecht.<br />

Timothy Baldw<strong>in</strong>, John Beavers, Leonoor van der Beek, Francis Bond, Dan Flick<strong>in</strong>ger,<br />

and Ivan A. Sag. 2006. In search of a systematic treatment of determ<strong>in</strong>erless PPs.<br />

In Patrick Sa<strong>in</strong>t-Dizier, editor, Syntax and Semantics of <strong>Preposition</strong>s. Spr<strong>in</strong>ger,<br />

Dordrecht, pages 163-179.<br />

Christian Chiarcos, Stefanie Dipper, Michael Götze, Ulf Leser, Anke Lüdel<strong>in</strong>g, Julia Ritz,<br />

and Manfred Stede. 2008. A flexible framework <strong>for</strong> <strong>in</strong>tegrat<strong>in</strong>g annotations from<br />

different tools and tagsets. Traitement Automatique des Langues. Special Issue<br />

Plat<strong>for</strong>ms <strong>for</strong> Natural Language Process<strong>in</strong>g. ATALA, 49 (2).<br />

Florian Dömges, Tibor Kiss, Antje Müller, and Claudia Roch. 2007. Measur<strong>in</strong>g the<br />

productivity of determ<strong>in</strong>erless PPs. In Proceed<strong>in</strong>gs of the ACL 2007 Workshop on<br />

<strong>Preposition</strong>s, pages 31-37, Prague, Czech Republic.<br />

Stefan Evert. 2004. A Simple LNRE Model <strong>for</strong> Random Character Sequences.<br />

In Proceed<strong>in</strong>gs of the 7ème Journées Internationales d‘Analyse Statistique des<br />

Données Textuelles, pages 411-422, Louva<strong>in</strong>-la-Neuve, Belgium.<br />

Stefan Evert. 2005. The CQP Query Language Tutorial (CWB version 2.2.b90).<br />

Institut für Masch<strong>in</strong>elle Sprachverarbeitung, University of Stuttgart. 30


Literature<br />

Wolfgang Fleischer. 1982. Phraseologie der deutschen Gegenwartssprache. VEB,<br />

Leipzig.<br />

Nikolaus Himmelmann. 1998. Regularity <strong>in</strong> irregularity: Article use <strong>in</strong> adpositional<br />

phrases. L<strong>in</strong>guistic Typology, 2:315–353.<br />

Tibor Kiss. 2007. Produktivität und Idiomatizität von Präposition-Substantiv-Sequenzen.<br />

Zeitschrift für Sprachwissenschaft, 26(2): 317-345.<br />

Tibor Kiss and Jan Strunk. 2006. Unsupervised multil<strong>in</strong>gual sentence boundary detection.<br />

Computational L<strong>in</strong>guistics, 32(4): 485-525.<br />

Tom Mitchell. 1997. Mach<strong>in</strong>e Learn<strong>in</strong>g. McGraw-Hill, New York.<br />

Christoph Müller and Michael Strube. 2006. Multilevel annotation of l<strong>in</strong>guistic data with<br />

MMAX2. In Sab<strong>in</strong>e Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus<br />

Technology and Language Pedagogy. New Resources, New Tools, New Methods.<br />

(English Corpus L<strong>in</strong>guistics Vol. 3). Peter Lang, Frankfurt, pages 197-214.<br />

Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A<br />

Comprehensive Grammar of the English Language. Longman, London.<br />

Helmut Schmid. 1995. Improvements <strong>in</strong> part-of-speech tagg<strong>in</strong>g with an application to<br />

<strong>German</strong>. In Proceed<strong>in</strong>gs of the EACL SIGDAT Workshop, Dubl<strong>in</strong>, Ireland.<br />

31


Literature<br />

Helmut Schmid, Arne Fitschen, and Ulrich Heid. 2004. SMOR: A <strong>German</strong> computational<br />

morphology cover<strong>in</strong>g derivation, composition, and <strong>in</strong>flection. In Proceed<strong>in</strong>gs of LREC<br />

2004, pages 1263-1266, Lisbon, Portugal.<br />

Helmut Schmid and Florian Laws. 2008. Estimation of conditional probabilities with<br />

decision trees and an application to f<strong>in</strong>e-gra<strong>in</strong>ed POS tagg<strong>in</strong>g. In Proceed<strong>in</strong>gs of<br />

COLING 2008, Manchester, UK.<br />

Tobias Stadtfeld, Tibor Kiss, Antje Müller, and Claudia Roch. 2009. Cha<strong>in</strong><strong>in</strong>g classifiers<br />

to determ<strong>in</strong>e noun countability. Submitted to ACL 2009, S<strong>in</strong>gapore.<br />

Beata Traw<strong>in</strong>ski. 2003. The syntax of “Complex <strong>Preposition</strong>s” <strong>in</strong> <strong>German</strong>: An HPSG<br />

Approach. In Proceed<strong>in</strong>gs of GLIP, volume 5, pages 155–166.<br />

Beata Traw<strong>in</strong>ski, Manfred Sailer, and Jan-Philipp Soehn. 2006. Comb<strong>in</strong>atorial Aspects of<br />

Collocational <strong>Preposition</strong>al Phrases. In Patrick Sa<strong>in</strong>t-Dizier, editor, Syntax and<br />

Semantics of <strong>Preposition</strong>s (Text, Speech and Language Technology 29), pages 181-<br />

196, Spr<strong>in</strong>ger.<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!