Extension and use of GermaNet, a lexical semantic database
Extension and use of GermaNet, a lexical semantic database
Extension and use of GermaNet, a lexical semantic database
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
as the sole argument, <strong>and</strong> can be recognized <strong>and</strong> marked<br />
by the lexicographer. The well-known <strong>semantic</strong> features<br />
like DEVWUDNW ‘abstract’ <strong>and</strong> NRQNUHW ‘concrete’ are too<br />
general for covering the base vocabulary; more specific<br />
<strong>semantic</strong> properties for defining the constraints would be<br />
subject to controversial judgements. Statistical methods on<br />
analyzing the co-occurrences <strong>of</strong> predicates <strong>and</strong><br />
complements in large text corpora (Resnik, 1993; Abe &<br />
Li, 1996) help to determine the adequate level <strong>of</strong><br />
generalization, i.e. 1DKUXQJVPLWWHO 12 ‘food’ as preferred<br />
c<strong>and</strong>idate <strong>of</strong> the verb NRFKHQ ‘cook’. Since<br />
1DKUXQJVPLWWHO can be preferred for both the subject <strong>and</strong><br />
object complement <strong>of</strong> NRFKHQ (depending on the<br />
alternation variant), the selectional preferences should be<br />
mapped to the underlying <strong>semantic</strong> role (see McCarthy &<br />
Korhonen, 1998), i.e. the PATIENT role.<br />
The statistical determination <strong>of</strong> selectional preferences <strong>of</strong><br />
predicates ideally yields the <strong>semantic</strong> role preferences for<br />
being encoded or verified in <strong>GermaNet</strong>.<br />
6HPDQWLF WDJJLQJ<br />
<strong>GermaNet</strong> is being applied in a test phase on the semiautomatic<br />
tagging <strong>of</strong> syntactically disambiguated<br />
sentences, which will provide a first step towards the<br />
development <strong>of</strong> reliable tag-sets for the <strong>semantic</strong><br />
annotation <strong>of</strong> corpora 13 . Each literal <strong>of</strong> the syntactically<br />
annotated output is assigned the best match from among<br />
the <strong>GermaNet</strong> synsets. Anyway, several concepts are<br />
missing in the resource, <strong>and</strong> some meanings do not match<br />
exactly, since either <strong>GermaNet</strong> senses are too general or<br />
too fine-grained for capturing the exact meaning which<br />
the (human) sense-tagger has in mind.<br />
For example, the literal *HVFKLFKWH has 7 readings in<br />
<strong>GermaNet</strong> that can be <strong>semantic</strong>ally clustered <strong>and</strong> reduced<br />
to 3 senses (KLVWRU\: the accumulation <strong>of</strong> happenings in the<br />
past; KLVWRU\ as school subject; VWRU\).<br />
Different aspects <strong>of</strong> the same concept ERRN like its<br />
physical representation as countable object <strong>and</strong> its content<br />
will not be treated as instances <strong>of</strong> regular polysemy, but <strong>of</strong><br />
underspecification.<br />
The degree <strong>of</strong> polysemy within <strong>GermaNet</strong> is quite low,<br />
but WordNet meanings require sense clustering<br />
techniques for reducing the <strong>semantic</strong> search space in<br />
information retrieval (Peters et al., 1998).<br />
Consider the term 0LWWHOVWDQGVDQHNGRWHQ ‘anecdotes<br />
typical for members <strong>of</strong> middle classes’ which was object<br />
to tagging <strong>and</strong> could not be detected in <strong>GermaNet</strong>. This<br />
ad-hoc composition, produced in the domain <strong>of</strong> the<br />
subculture music scene, is not <strong>lexical</strong>ized, though<br />
underst<strong>and</strong>able in the given context. Accessing only the<br />
base noun Anekdote ‘anecdote’ may capture the more<br />
relevant meaning component, but the contribution <strong>of</strong> the<br />
first noun <strong>and</strong> the modus <strong>of</strong> combination will be lost.<br />
Thus, some morphological analysis would be very <strong>use</strong>ful.<br />
Feedback on missing literals or meanings in <strong>GermaNet</strong><br />
is provided. Once the first tag-set is defined, the corpus<br />
training phase can be started.<br />
&RQFOXVLRQ<br />
12 1DKUXQJVPLWWHO corresponds to the same level <strong>of</strong> abstraction<br />
like a EuroWordNet base concept.<br />
13 This application is being carried out by P. Buitelaar <strong>and</strong> his<br />
co-workers at the DFKI, Saarbrücken.<br />
This paper has presented the architecture <strong>of</strong> <strong>GermaNet</strong>,<br />
emphasizing its approach to artificial concepts <strong>and</strong> verb<br />
representation, <strong>and</strong> outlines our basic perspectives<br />
concerning the extension <strong>and</strong> <strong>use</strong> <strong>of</strong> the <strong>database</strong> for<br />
important tasks within NLP like sense-tagging <strong>and</strong> the<br />
acquisition <strong>of</strong> selectional preferences. Extending <strong>and</strong><br />
improving <strong>GermaNet</strong> may support the respective<br />
applications, which, on the other h<strong>and</strong>, hint at deficiencies<br />
<strong>and</strong> inconsistencies <strong>of</strong> our resource. We assume that both<br />
lines <strong>of</strong> actions, <strong>database</strong> extension as well as test<br />
applications, will mutually benefit from one another.<br />
5HIHUHQFHV<br />
Abe, N. & H. Li (1996). Learning Word Association<br />
Norms Using Tree Cut Pair Model. In 3URFHHGLQJV RI<br />
��<br />
,QWHUQDWLRQDO &RQIHUHQFH RQ 0DFKLQH /HDUQLQJ.<br />
Buitelaar, P. (1999). Concepts in Multilingual Information<br />
Retrieval. In 3URFHHGLQJV RI (852/$1 , 252-262).<br />
4 th European Summer School on Human Language<br />
Technology. Iasi, Romania. July, 1999.<br />
Burnage, G. (1995). 7KH &(/(; /H[LFDO 'DWDEDVH<br />
5HOHDVH . Max Planck Institute for Psycholinguistics:<br />
Nijmegen, The Netherl<strong>and</strong>s.<br />
Cr<strong>use</strong>, D.A. (1986). /H[LFDO VHPDQWLFV. Cambridge:<br />
Cambridge University Press.<br />
Fellbaum, C. (1998). :RUG1HW $Q (OHFWURQLF /H[LFDO<br />
'DWDEDVH. Cambridge, Mass.: MIT Press.<br />
Kunze, C. (1999). Semantics <strong>of</strong> Verbs within <strong>GermaNet</strong><br />
<strong>and</strong> EuroWordNet. In: Kordoni, E. (ed.), /H[LFDO<br />
6HPDQWLFV DQG /LQNLQJ LQ &RQVWUDLQW %DVHG 7KHRULHV.<br />
Workshop Proceedings <strong>of</strong> the 11 th European Summer<br />
School in Logic, Language <strong>and</strong> Information, 189-200.<br />
Mädche, A. & S. Staab (2000). Semi-Automatic<br />
Engineering <strong>of</strong> Ontologies from Texts. In 3URFHHGLQJV<br />
RI WKH<br />
��<br />
,QWHUQDWLRQDO &RQIHUHQFH RQ 6RIWZDUH<br />
(QJLQHHULQJ DQG .QRZOHGJH (QJLQHHULQJ 6(.( .<br />
McCarthy, D. & A. Korhonen (1998). Detecting Verbal<br />
Participation in Diathesis Alternations. In 3URFHHGLQJV<br />
RI WKH<br />
��<br />
$QQXDO 0HHWLQJ RI WKH $&/. Vol. 2, 1493-<br />
1495. Montreal, Canada.<br />
Miller, G. & R. Beckwith & C. Fellbaum & D. Gross &<br />
K. Miller (1993). )LYH 3DSHUV RQ :RUG1HW. CSL<br />
Report, Vol. 43. Cognitive Science Laboratory,<br />
Princeton University.<br />
PAROLE, German Resources (1998). Produced in LE 2-<br />
4017. Mannheim: Institut für deutsche Sprache.<br />
Peters, W. & I. Peters & P. Vossen (1998). The Reduction<br />
<strong>of</strong> Semantic Ambiguity in Linguistic Resources. In<br />
3URFHHGLQJV RI WKH<br />
��<br />
,QWHUQDWLRQDO &RQIHUHQFH RQ<br />
/DQJXDJH 5HVRXUFHV DQG (YDOXDWLRQ. Granada, Spain.<br />
Resnik, P.S. (1993). 6HOHFWLRQ DQG ,QIRUPDWLRQ $ &ODVV<br />
%DVHG $SSURDFK WR /H[LFDO 5HODWLRQVKLSV. PhD thesis,<br />
University <strong>of</strong> Pennsylvania.<br />
Vossen, P. (1999). EuroWordNet. Building a Multilingual<br />
Database with Lexical-Semantic Networks for the<br />
European Languages. In 3URFHHGLQJV RI (852/$1 ,<br />
262-272). 4 th European Summer School on Human<br />
Language Technology. Iasi, Romania. July, 1999.<br />
Wagner, A. & C. Kunze (1999). Anwendungsperspektiven<br />
des <strong>GermaNet</strong>, eines lexikalisch-semantischen Netzes<br />
für das Deutsche. To appear in: Schröder, B. et al.<br />
(eds.), 3UREOHPH XQG 3HUVSHNWLYHQ FRPSXWHUJHVW W]WHU<br />
/H[LNRJUDSKLH Tübingen:Niemeyer.