Mining for Preposition- Noun Constructions in German
Mining for Preposition- Noun Constructions in German
Mining for Preposition- Noun Constructions in German
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>M<strong>in</strong><strong>in</strong>g</strong> <strong>for</strong> <strong>Preposition</strong>-<br />
<strong>Noun</strong> <strong>Constructions</strong><br />
<strong>in</strong> <strong>German</strong><br />
Katja Keßelmeier, Tibor Kiss, Antje Müller,<br />
Claudia Roch, Tobias Stadtfeld, Jan Strunk<br />
NODALIDA 2009 – Workshop on Extract<strong>in</strong>g<br />
and Us<strong>in</strong>g <strong>Constructions</strong> <strong>in</strong> NLP<br />
May 14, 2009, Odense, Denmark<br />
Sprachwissenschaftliches Institut
Outl<strong>in</strong>e of the Talk<br />
� <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> <strong>in</strong> <strong>German</strong><br />
� Def<strong>in</strong>ition<br />
� Problem<br />
� Properties<br />
� Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />
� Corpus<br />
� Automatic annotation<br />
� Countability classification<br />
� Identification and extraction<br />
� Evaluation and discussion<br />
� Outlook<br />
� Manual annotation<br />
� Further annotation cycles<br />
� Annotation m<strong>in</strong><strong>in</strong>g<br />
2
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Def<strong>in</strong>ition and Examples<br />
� A <strong>Preposition</strong>-<strong>Noun</strong> Construction (PNC) is a prepositional phrase<br />
with a determ<strong>in</strong>erless nom<strong>in</strong>al complement (that would normally have<br />
to be accompanied by a determ<strong>in</strong>er).<br />
� Sometimes also called „determ<strong>in</strong>erless PP‟<br />
� Examples<br />
auf Anfrage (after be<strong>in</strong>g asked), auf Auf<strong>for</strong>derung (on request),<br />
durch Beobachtung (through observation), <strong>in</strong> Anspielung (allud<strong>in</strong>g to),<br />
mit Vorbehalt (with reservations), unter Androhung (under threat),<br />
ohne Vorwarnung (without warn<strong>in</strong>gs)<br />
3
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Problem<br />
� <strong>Noun</strong>s that usually have to occur with a determ<strong>in</strong>er can<br />
occur without one <strong>in</strong> PNCs.<br />
� S<strong>in</strong>gular count common nouns<br />
An sich habe ich ja den Grundsatz, dass ich nicht auf Anfrage blogge.<br />
‘Normally I have the pr<strong>in</strong>ciple that I do not blog upon request.’<br />
(praegnanz.de/weblog/bloggen-auf-anfrage-e<strong>in</strong>-versuch)<br />
wir werden *(die) Anfrage unverzüglich bearbeiten<br />
‘We will process the request immediately.’<br />
(http://www.golfenweltweit.de/noscan/Anfragen_villa_sp.htm)<br />
4
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Problem<br />
� The miss<strong>in</strong>g determ<strong>in</strong>er <strong>in</strong> PNCs conflicts with descriptive<br />
grammar, generative theory and typological observations<br />
� Rule 442 from the <strong>German</strong> Duden Grammar (Duden 2005, p. 337)<br />
“<strong>Noun</strong>s with the feature comb<strong>in</strong>ation countable plus s<strong>in</strong>gular thus<br />
always have an accompany<strong>in</strong>g determ<strong>in</strong>er with them […]”<br />
� “We assume that common nouns subcategorize <strong>for</strong> their<br />
determ<strong>in</strong>ers […].” (Pollard and Sag 1994, p. 49)<br />
� Languages typically require a s<strong>in</strong>gular count noun to be realized<br />
with a determ<strong>in</strong>er if the language has a determ<strong>in</strong>er system,<br />
a s<strong>in</strong>gular/plural, and a count/mass dist<strong>in</strong>ction.<br />
(Himmelmann 1998)<br />
6
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Properties – Compositionality<br />
� PNCs are often regarded as exceptions, as fixed expressions<br />
that can be listed (cf. Duden 2005, p. 306; Fleischer 1982) or<br />
as complex prepositions (Traw<strong>in</strong>ski 2003; Traw<strong>in</strong>ski et al. 2006)<br />
� But the nouns conta<strong>in</strong>ed <strong>in</strong> PNCs can be modified:<br />
[ PP P [ N‟ ... N ...]]<br />
auf parlamentarische Anfrage (after be<strong>in</strong>g asked <strong>in</strong> parliament),<br />
durch kritische Beobachtung (through critical observation),<br />
mit leisem Vorbehalt (with quiet reservations),<br />
unter sanfter Androhung (under gentle threat)<br />
7
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Properties – Compositionality<br />
Comparison of collocational strength between the preposition unter (under) and s<strong>in</strong>gular<br />
and plural count nouns respectively us<strong>in</strong>g Dunn<strong>in</strong>g„s log likelihood ratio (Kiss 2007)<br />
8
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Properties – Productivity<br />
Corpus<br />
Size<br />
(million<br />
words)<br />
N<br />
(tokens)<br />
V(N)<br />
(types)<br />
V(1,N)<br />
(hapax)<br />
E[V(1,N)]<br />
(exp.<br />
hapax)<br />
P(N)<br />
(productivity)<br />
P(N)<br />
(plural)<br />
6 551 110 74 96.66 0.175 0.451<br />
12 1,086 164 104 123.47 0.114 0.438<br />
26 2,185 267 169 166.41 0.076 0.347<br />
52 4,336 435 282 249.93 0.058 0.288<br />
78 6,343 564 384 332.19 0.052 0.241<br />
107 8,526 684 469 400.43 0.047 0.217<br />
213 16,427 1,086 746 748.81 0.046 0.197<br />
Productivity of unter (under) and s<strong>in</strong>gular and plural count nouns respectively based on<br />
models by Baayen (2001), Evert (2004) and Dömges et al. (2007) (Kiss 2007)<br />
9
<strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong> –<br />
Further Properties<br />
� Productive and compositional, but restricted <strong>in</strong> ill-understood<br />
ways<br />
unter (der) Voraussetzung des E<strong>in</strong>verständnisses des Ensembles<br />
unter *(der) Prämisse des E<strong>in</strong>verständnisses des Ensembles<br />
‘on (the) condition of the approval by the ensemble’<br />
� Can often be replaced by regular PPs without a change <strong>in</strong><br />
mean<strong>in</strong>g<br />
Milosevic unterschrieb auch unter Androhung/der<br />
Milosevic signed even under threatsg / the<br />
Androhung/Androhungen von NATO-Bombardementen nicht.<br />
threat / threatspl of bomb attack-by-NATO not<br />
‘Milosevic did not even sign under pa<strong>in</strong> of NATO bomb attacks.’<br />
� Speakers do not have good <strong>in</strong>tuitions about PNCs and are<br />
reluctant to produce or judge new examples<br />
10
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Corpus<br />
� We use volumes 1993 to1999 of Neue Zürcher Zeitung<br />
(NZZ), a <strong>German</strong>-language newspaper from Zürich.<br />
� This corpus comprises more than 208 million tokens.<br />
� A daily issue conta<strong>in</strong>s about 98,000 tokens on average.<br />
� We are plann<strong>in</strong>g to at least double the size of our corpus<br />
� Volumes 2000 to 2004 of NZZ<br />
� Volumes 1986 to 1999 of taz (taken from the TüPP-D/Z corpus)<br />
11
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Automatic Annotation<br />
� Tokenization<br />
� Sentence boundary detection<br />
(PUNKT system, Kiss and Strunk 2006)<br />
� Heuristic recognition of document structure<br />
� Headl<strong>in</strong>es<br />
� Paragraphs<br />
� Important s<strong>in</strong>ce headl<strong>in</strong>es allow telegraph style with<br />
anomalous lack of determ<strong>in</strong>ers<br />
Kanzler<strong>in</strong> gibt Garantie für Spare<strong>in</strong>lagen<br />
‘Chancelor provides guarantee <strong>for</strong> sav<strong>in</strong>gs’<br />
(http://www.f<strong>in</strong>areo.de/news/kanzler<strong>in</strong>-gibt-garantie-fuer-spare<strong>in</strong>lagen.php)<br />
12
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Automatic Annotation<br />
� Regression Forest Tagger (RFT) (Schmid and Laws 2008)<br />
� Robust POS tagg<strong>in</strong>g with high accuracy<br />
� Morphological analysis and lemmatization based on<br />
SMOR (Schmid et al. 2004)<br />
� Tra<strong>in</strong>ed on complete lexicon extracted from the corpus<br />
� Provides number feature <strong>for</strong> nouns<br />
� Prelim<strong>in</strong>ary evaluation resulted <strong>in</strong> over 97 % accuracy wrt. number<br />
evaluated on a small sample of 500 nouns taken from NZZ 1995<br />
� TreeTagger (Schmid 1995)<br />
� Additional POS annotation<br />
� Non-recursive m<strong>in</strong>imal chunks<br />
� Most importantly preposition chunks (PC)<br />
13
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Aggregation<br />
� We aggregate the output of<br />
RFT and TT <strong>in</strong> one <strong>for</strong>mat<br />
� Inl<strong>in</strong>e XML <strong>for</strong>mat<br />
� And def<strong>in</strong>e global IDs<br />
� Used across all tools and<br />
<strong>for</strong>mats<br />
� Hierarchical<br />
� Human readable<br />
NZZ_1994_04_27_a32_seg5_s13_t4<br />
� We will switch to stand-off<br />
<strong>for</strong>mats<br />
� MMAX2 <strong>for</strong> manual annotation<br />
� PAULA <strong>in</strong> the long run<br />
Name of newspaper NZZ<br />
Year 1994<br />
Month 04<br />
Day 27<br />
Number of article<br />
(<strong>in</strong> daily issue)<br />
Number of segment<br />
(<strong>in</strong> article)<br />
Number of sentence<br />
(<strong>in</strong> segment)<br />
Number of token<br />
(<strong>in</strong> sentence)<br />
32<br />
5<br />
13<br />
4<br />
14
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Inl<strong>in</strong>e XML Format<br />
[…]<br />
[…]<br />
[…]<br />
<br />
unter<br />
Verweis<br />
<br />
<br />
auf<br />
die<br />
laufenden<br />
Ermittlungen<br />
[…]<br />
<br />
15
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />
– Countability Classification<br />
� Build<strong>in</strong>g a tra<strong>in</strong><strong>in</strong>g set<br />
� Manual annotation of 10,000 <strong>German</strong> noun lemmas<br />
<strong>for</strong> their countability preference (Allan 1980)<br />
� Auto (car) � +count<br />
� Wasser (water) � -count<br />
� Per<strong>for</strong>med by four tra<strong>in</strong>ed l<strong>in</strong>guists<br />
� Lemmas that the annotators disagreed on were discarded<br />
� Lemmas that did not show a class-plausible ratio of s<strong>in</strong>gular and<br />
plural occurrences (based on RFT) were also dismissed<br />
� Result: 4,267 classified nouns (74 % +count, 26 % -count)<br />
16
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />
– Countability Classification<br />
� Naïve Bayes Classifier<br />
� Probability of noun be<strong>in</strong>g<br />
+count given its context C<br />
P(+count|C)<br />
� Context: immediately<br />
preced<strong>in</strong>g token<br />
� RFT-POS<br />
� TT-POS<br />
� Lemma<br />
� Contexts are aggregated<br />
to classify noun types<br />
Context (C) +count -count P(C|+count)<br />
PIAT PRO viel 0 1765 0.0005<br />
KOKOM CONJ wie 327 1200 0.2145<br />
VMFIN VFIN sollen 37 237 0.1376<br />
ART ART e<strong>in</strong>en 246 15 0.9391<br />
PIAT PRO ke<strong>in</strong>e 4287 2969 0.5907<br />
… 819 different contexts …<br />
� Classification thresholds<br />
P(+count|C) < 0.35 � -count<br />
P(+count|C) > 0.5 � +count<br />
otherwise � unknown<br />
17
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />
– Countability Classification<br />
� Decision Tree Classifier (J4.8)<br />
� Tra<strong>in</strong>ed on 7,814 noun types<br />
� Us<strong>in</strong>g 10-fold crossvalidation<br />
� Uses s<strong>in</strong>gular/total ratio of noun as sole feature<br />
� Accuracy over 90 %<br />
� S<strong>in</strong>gular/total-ratio 0.98<br />
| S<strong>in</strong>gular/total-ratio 0.997: -count<br />
18
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong> <strong>Constructions</strong><br />
– Countability Classification<br />
� Comb<strong>in</strong>ation of classifiers<br />
� Classifiers have to agree<br />
� Otherwise the noun is classified as unkown<br />
� Evaluated on 100 nouns classified as +count and<br />
100 nouns classified as -count<br />
� Accuracy <strong>for</strong> +count: 93 %<br />
� Accuracy <strong>for</strong> -count: 88 %<br />
� We plan to use bootstrapp<strong>in</strong>g to <strong>in</strong>crease the size of the<br />
tra<strong>in</strong><strong>in</strong>g set<br />
19
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Extraction<br />
� Us<strong>in</strong>g (Open) Corpus Workbench / Corpus Query<br />
Processor (Evert 2005)<br />
� Able to <strong>in</strong>dex and query shallow l<strong>in</strong>guistic annotation<br />
� Can cope with huge amounts of data<br />
� Still could not <strong>in</strong>dex and search NZZ 1993-1999 <strong>in</strong> one go<br />
� Only m<strong>in</strong>imal conversion necessary from <strong>in</strong>l<strong>in</strong>e XML <strong>for</strong>mat<br />
� In<strong>for</strong>mation about tokens � positional attributes<br />
� In<strong>for</strong>mation about larger units � structural attributes<br />
20
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Extraction<br />
� CQP query<br />
<br />
[(word="an" %cd) & (rft_pos="APPR")]<br />
[(rft_pos!="(ART|…)") &<br />
(tt_pos!="(ART|…)")]*<br />
[(tt_pos="NN") & (rft_morph!=".*\.Pl\..*") &<br />
(countability="count")]<br />
<br />
� List of 23 prepositions that typically take noun complements and<br />
assign them case<br />
� an, auf, bei, b<strong>in</strong>nen, dank, durch, für, gegen, gemäß, h<strong>in</strong>ter, <strong>in</strong>, mit, mittels,<br />
nach, neben, ohne, seit, über, um, unter, vor, während, wegen<br />
21
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Evaluation<br />
� Creat<strong>in</strong>g a Gold Standard<br />
� One daily issue of NZZ chosen randomly<br />
� April 3, 1997<br />
� 6,081 sentences and 91,357 tokens<br />
� Extraction of all occurrences of the 23 prepositions<br />
� 5,304 hits<br />
� Based only on their word <strong>for</strong>m<br />
� Manually marked all true PNCs<br />
� 161 true PNCs<br />
22
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Evaluation<br />
� Test set<br />
� Extraction of putative PNCs headed by the 23 prepositions<br />
us<strong>in</strong>g the CQP query and the automatic annotation<br />
� Results<br />
Hits Precision Recall F-Measure<br />
+count 56 48.21 % 16.77 % 24.88 %<br />
+count and unknown 610 23.44 % 88.82 % 37.09 %<br />
� -count as a knockout criterion <strong>in</strong>creases recall more than fivefold<br />
� High recall is more important to us than high precision<br />
23
Extract<strong>in</strong>g <strong>Preposition</strong>-<strong>Noun</strong><br />
<strong>Constructions</strong> – Discussion<br />
� Low recall<br />
� Data sparseness � PNCs are <strong>in</strong>deed productive!<br />
� Very cautious strategy of countability classification result<strong>in</strong>g <strong>in</strong> a low<br />
coverage of the countability classifier (cf. Stadtfeld et al. 2009)<br />
� Be less cautious and <strong>in</strong>crease amount of tra<strong>in</strong><strong>in</strong>g data <strong>for</strong><br />
countability classification<br />
� Less than optimal precision<br />
� Determ<strong>in</strong>erless nouns <strong>in</strong> headl<strong>in</strong>es<br />
� Idiomatic preposition noun comb<strong>in</strong>ations<br />
� B<strong>in</strong>om<strong>in</strong>al constructions such as unter Dach und Fach<br />
under roof and case<br />
‘done and dusted’<br />
� Chunk<strong>in</strong>g errors<br />
� Try to exclude headl<strong>in</strong>es and b<strong>in</strong>om<strong>in</strong>al constructions<br />
by ref<strong>in</strong><strong>in</strong>g CQP query<br />
24
Outlook – Manual Annotation<br />
� Extraction of sentence IDs from the output of CQP<br />
� Aggregation <strong>in</strong> small batches of 50 sentences as<br />
convenient work<strong>in</strong>g packages<br />
� Manual annotation<br />
� Tool: MMAX2 (Müller and Strube 2006)<br />
� Stand-off annotation<br />
� Unlimited number of annotation layers that can be added <strong>in</strong>crementally<br />
� Markables and po<strong>in</strong>ter relations between them<br />
� High customizable<br />
� Levels of manual annotation<br />
� Lexical semantics of noun and preposition<br />
� Morphology complexity and etymological status of the noun<br />
� Valency of the noun<br />
� Metadata<br />
� …<br />
25
Outlook – Manual Annotation –<br />
MMAX2<br />
26
Outlook – Further Annotation Cycles<br />
corpora<br />
count-mass<br />
Naïve Bayes<br />
Weka J48<br />
automatic annotation<br />
Tagg<strong>in</strong>g (RFT, TT)<br />
Lemmatizer<br />
SMOR Inflectional Morphology<br />
Phrasal Chunks (TT)<br />
stand-off annotation<br />
[P … N]<br />
PP, NP<br />
manual annotation<br />
subcategorization<br />
lexical semantics<br />
lexeme properties<br />
analysis<br />
Weka<br />
Pr<strong>in</strong>cipal Component<br />
Analysis<br />
L<strong>in</strong>ear Modell<strong>in</strong>g<br />
PP, NP<br />
[P … N]<br />
27
Conclusion<br />
� Work on relatively rare, l<strong>in</strong>guistically “non-core” and partly irregular<br />
constructions such as PNCs<br />
� Cannot be based on speaker <strong>in</strong>tuitions<br />
� Necessitates the use of large corpora and annotation m<strong>in</strong><strong>in</strong>g<br />
(Chiarcos et al. 2008)<br />
� Use of automatic annotation tools and countability classification<br />
shows great promise <strong>for</strong> this k<strong>in</strong>d of research<br />
� There is still much room <strong>for</strong> improvement <strong>in</strong> the extraction of PNCs<br />
� Many practical problems<br />
� Amount of data too large <strong>for</strong> many tools<br />
� Much of the work consists <strong>in</strong> convert<strong>in</strong>g between data <strong>for</strong>mats<br />
� Difficult to build and ma<strong>in</strong>ta<strong>in</strong> the <strong>in</strong>frastructure <strong>for</strong> such a data-<strong>in</strong>tensive<br />
l<strong>in</strong>guistic research project<br />
28
Acknowledgements<br />
� The research reported here was partially funded by the<br />
Deutsche Forschungsgeme<strong>in</strong>schaft (DFG) under grant<br />
no. KI 759/5-1.<br />
� We would like to thank very much<br />
� Stefanie Dipper<br />
� Helmut Schmid<br />
29
Literature<br />
Keith Allan. 1980. <strong>Noun</strong>s and countability. Language 56(3): 541-567.<br />
R. Harald Baayen. 2001. Word Frequency Distributions. Kluwer, Dordrecht.<br />
Timothy Baldw<strong>in</strong>, John Beavers, Leonoor van der Beek, Francis Bond, Dan Flick<strong>in</strong>ger,<br />
and Ivan A. Sag. 2006. In search of a systematic treatment of determ<strong>in</strong>erless PPs.<br />
In Patrick Sa<strong>in</strong>t-Dizier, editor, Syntax and Semantics of <strong>Preposition</strong>s. Spr<strong>in</strong>ger,<br />
Dordrecht, pages 163-179.<br />
Christian Chiarcos, Stefanie Dipper, Michael Götze, Ulf Leser, Anke Lüdel<strong>in</strong>g, Julia Ritz,<br />
and Manfred Stede. 2008. A flexible framework <strong>for</strong> <strong>in</strong>tegrat<strong>in</strong>g annotations from<br />
different tools and tagsets. Traitement Automatique des Langues. Special Issue<br />
Plat<strong>for</strong>ms <strong>for</strong> Natural Language Process<strong>in</strong>g. ATALA, 49 (2).<br />
Florian Dömges, Tibor Kiss, Antje Müller, and Claudia Roch. 2007. Measur<strong>in</strong>g the<br />
productivity of determ<strong>in</strong>erless PPs. In Proceed<strong>in</strong>gs of the ACL 2007 Workshop on<br />
<strong>Preposition</strong>s, pages 31-37, Prague, Czech Republic.<br />
Stefan Evert. 2004. A Simple LNRE Model <strong>for</strong> Random Character Sequences.<br />
In Proceed<strong>in</strong>gs of the 7ème Journées Internationales d‘Analyse Statistique des<br />
Données Textuelles, pages 411-422, Louva<strong>in</strong>-la-Neuve, Belgium.<br />
Stefan Evert. 2005. The CQP Query Language Tutorial (CWB version 2.2.b90).<br />
Institut für Masch<strong>in</strong>elle Sprachverarbeitung, University of Stuttgart. 30
Literature<br />
Wolfgang Fleischer. 1982. Phraseologie der deutschen Gegenwartssprache. VEB,<br />
Leipzig.<br />
Nikolaus Himmelmann. 1998. Regularity <strong>in</strong> irregularity: Article use <strong>in</strong> adpositional<br />
phrases. L<strong>in</strong>guistic Typology, 2:315–353.<br />
Tibor Kiss. 2007. Produktivität und Idiomatizität von Präposition-Substantiv-Sequenzen.<br />
Zeitschrift für Sprachwissenschaft, 26(2): 317-345.<br />
Tibor Kiss and Jan Strunk. 2006. Unsupervised multil<strong>in</strong>gual sentence boundary detection.<br />
Computational L<strong>in</strong>guistics, 32(4): 485-525.<br />
Tom Mitchell. 1997. Mach<strong>in</strong>e Learn<strong>in</strong>g. McGraw-Hill, New York.<br />
Christoph Müller and Michael Strube. 2006. Multilevel annotation of l<strong>in</strong>guistic data with<br />
MMAX2. In Sab<strong>in</strong>e Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus<br />
Technology and Language Pedagogy. New Resources, New Tools, New Methods.<br />
(English Corpus L<strong>in</strong>guistics Vol. 3). Peter Lang, Frankfurt, pages 197-214.<br />
Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A<br />
Comprehensive Grammar of the English Language. Longman, London.<br />
Helmut Schmid. 1995. Improvements <strong>in</strong> part-of-speech tagg<strong>in</strong>g with an application to<br />
<strong>German</strong>. In Proceed<strong>in</strong>gs of the EACL SIGDAT Workshop, Dubl<strong>in</strong>, Ireland.<br />
31
Literature<br />
Helmut Schmid, Arne Fitschen, and Ulrich Heid. 2004. SMOR: A <strong>German</strong> computational<br />
morphology cover<strong>in</strong>g derivation, composition, and <strong>in</strong>flection. In Proceed<strong>in</strong>gs of LREC<br />
2004, pages 1263-1266, Lisbon, Portugal.<br />
Helmut Schmid and Florian Laws. 2008. Estimation of conditional probabilities with<br />
decision trees and an application to f<strong>in</strong>e-gra<strong>in</strong>ed POS tagg<strong>in</strong>g. In Proceed<strong>in</strong>gs of<br />
COLING 2008, Manchester, UK.<br />
Tobias Stadtfeld, Tibor Kiss, Antje Müller, and Claudia Roch. 2009. Cha<strong>in</strong><strong>in</strong>g classifiers<br />
to determ<strong>in</strong>e noun countability. Submitted to ACL 2009, S<strong>in</strong>gapore.<br />
Beata Traw<strong>in</strong>ski. 2003. The syntax of “Complex <strong>Preposition</strong>s” <strong>in</strong> <strong>German</strong>: An HPSG<br />
Approach. In Proceed<strong>in</strong>gs of GLIP, volume 5, pages 155–166.<br />
Beata Traw<strong>in</strong>ski, Manfred Sailer, and Jan-Philipp Soehn. 2006. Comb<strong>in</strong>atorial Aspects of<br />
Collocational <strong>Preposition</strong>al Phrases. In Patrick Sa<strong>in</strong>t-Dizier, editor, Syntax and<br />
Semantics of <strong>Preposition</strong>s (Text, Speech and Language Technology 29), pages 181-<br />
196, Spr<strong>in</strong>ger.<br />
32