25.12.2012 Views

Extension and use of GermaNet, a lexical semantic database

Extension and use of GermaNet, a lexical semantic database

Extension and use of GermaNet, a lexical semantic database

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

as the sole argument, <strong>and</strong> can be recognized <strong>and</strong> marked<br />

by the lexicographer. The well-known <strong>semantic</strong> features<br />

like DEVWUDNW ‘abstract’ <strong>and</strong> NRQNUHW ‘concrete’ are too<br />

general for covering the base vocabulary; more specific<br />

<strong>semantic</strong> properties for defining the constraints would be<br />

subject to controversial judgements. Statistical methods on<br />

analyzing the co-occurrences <strong>of</strong> predicates <strong>and</strong><br />

complements in large text corpora (Resnik, 1993; Abe &<br />

Li, 1996) help to determine the adequate level <strong>of</strong><br />

generalization, i.e. 1DKUXQJVPLWWHO 12 ‘food’ as preferred<br />

c<strong>and</strong>idate <strong>of</strong> the verb NRFKHQ ‘cook’. Since<br />

1DKUXQJVPLWWHO can be preferred for both the subject <strong>and</strong><br />

object complement <strong>of</strong> NRFKHQ (depending on the<br />

alternation variant), the selectional preferences should be<br />

mapped to the underlying <strong>semantic</strong> role (see McCarthy &<br />

Korhonen, 1998), i.e. the PATIENT role.<br />

The statistical determination <strong>of</strong> selectional preferences <strong>of</strong><br />

predicates ideally yields the <strong>semantic</strong> role preferences for<br />

being encoded or verified in <strong>GermaNet</strong>.<br />

6HPDQWLF WDJJLQJ<br />

<strong>GermaNet</strong> is being applied in a test phase on the semiautomatic<br />

tagging <strong>of</strong> syntactically disambiguated<br />

sentences, which will provide a first step towards the<br />

development <strong>of</strong> reliable tag-sets for the <strong>semantic</strong><br />

annotation <strong>of</strong> corpora 13 . Each literal <strong>of</strong> the syntactically<br />

annotated output is assigned the best match from among<br />

the <strong>GermaNet</strong> synsets. Anyway, several concepts are<br />

missing in the resource, <strong>and</strong> some meanings do not match<br />

exactly, since either <strong>GermaNet</strong> senses are too general or<br />

too fine-grained for capturing the exact meaning which<br />

the (human) sense-tagger has in mind.<br />

For example, the literal *HVFKLFKWH has 7 readings in<br />

<strong>GermaNet</strong> that can be <strong>semantic</strong>ally clustered <strong>and</strong> reduced<br />

to 3 senses (KLVWRU\: the accumulation <strong>of</strong> happenings in the<br />

past; KLVWRU\ as school subject; VWRU\).<br />

Different aspects <strong>of</strong> the same concept ERRN like its<br />

physical representation as countable object <strong>and</strong> its content<br />

will not be treated as instances <strong>of</strong> regular polysemy, but <strong>of</strong><br />

underspecification.<br />

The degree <strong>of</strong> polysemy within <strong>GermaNet</strong> is quite low,<br />

but WordNet meanings require sense clustering<br />

techniques for reducing the <strong>semantic</strong> search space in<br />

information retrieval (Peters et al., 1998).<br />

Consider the term 0LWWHOVWDQGVDQHNGRWHQ ‘anecdotes<br />

typical for members <strong>of</strong> middle classes’ which was object<br />

to tagging <strong>and</strong> could not be detected in <strong>GermaNet</strong>. This<br />

ad-hoc composition, produced in the domain <strong>of</strong> the<br />

subculture music scene, is not <strong>lexical</strong>ized, though<br />

underst<strong>and</strong>able in the given context. Accessing only the<br />

base noun Anekdote ‘anecdote’ may capture the more<br />

relevant meaning component, but the contribution <strong>of</strong> the<br />

first noun <strong>and</strong> the modus <strong>of</strong> combination will be lost.<br />

Thus, some morphological analysis would be very <strong>use</strong>ful.<br />

Feedback on missing literals or meanings in <strong>GermaNet</strong><br />

is provided. Once the first tag-set is defined, the corpus<br />

training phase can be started.<br />

&RQFOXVLRQ<br />

12 1DKUXQJVPLWWHO corresponds to the same level <strong>of</strong> abstraction<br />

like a EuroWordNet base concept.<br />

13 This application is being carried out by P. Buitelaar <strong>and</strong> his<br />

co-workers at the DFKI, Saarbrücken.<br />

This paper has presented the architecture <strong>of</strong> <strong>GermaNet</strong>,<br />

emphasizing its approach to artificial concepts <strong>and</strong> verb<br />

representation, <strong>and</strong> outlines our basic perspectives<br />

concerning the extension <strong>and</strong> <strong>use</strong> <strong>of</strong> the <strong>database</strong> for<br />

important tasks within NLP like sense-tagging <strong>and</strong> the<br />

acquisition <strong>of</strong> selectional preferences. Extending <strong>and</strong><br />

improving <strong>GermaNet</strong> may support the respective<br />

applications, which, on the other h<strong>and</strong>, hint at deficiencies<br />

<strong>and</strong> inconsistencies <strong>of</strong> our resource. We assume that both<br />

lines <strong>of</strong> actions, <strong>database</strong> extension as well as test<br />

applications, will mutually benefit from one another.<br />

5HIHUHQFHV<br />

Abe, N. & H. Li (1996). Learning Word Association<br />

Norms Using Tree Cut Pair Model. In 3URFHHGLQJV RI<br />

��<br />

,QWHUQDWLRQDO &RQIHUHQFH RQ 0DFKLQH /HDUQLQJ.<br />

Buitelaar, P. (1999). Concepts in Multilingual Information<br />

Retrieval. In 3URFHHGLQJV RI (852/$1 , 252-262).<br />

4 th European Summer School on Human Language<br />

Technology. Iasi, Romania. July, 1999.<br />

Burnage, G. (1995). 7KH &(/(; /H[LFDO 'DWDEDVH<br />

5HOHDVH . Max Planck Institute for Psycholinguistics:<br />

Nijmegen, The Netherl<strong>and</strong>s.<br />

Cr<strong>use</strong>, D.A. (1986). /H[LFDO VHPDQWLFV. Cambridge:<br />

Cambridge University Press.<br />

Fellbaum, C. (1998). :RUG1HW $Q (OHFWURQLF /H[LFDO<br />

'DWDEDVH. Cambridge, Mass.: MIT Press.<br />

Kunze, C. (1999). Semantics <strong>of</strong> Verbs within <strong>GermaNet</strong><br />

<strong>and</strong> EuroWordNet. In: Kordoni, E. (ed.), /H[LFDO<br />

6HPDQWLFV DQG /LQNLQJ LQ &RQVWUDLQW %DVHG 7KHRULHV.<br />

Workshop Proceedings <strong>of</strong> the 11 th European Summer<br />

School in Logic, Language <strong>and</strong> Information, 189-200.<br />

Mädche, A. & S. Staab (2000). Semi-Automatic<br />

Engineering <strong>of</strong> Ontologies from Texts. In 3URFHHGLQJV<br />

RI WKH<br />

��<br />

,QWHUQDWLRQDO &RQIHUHQFH RQ 6RIWZDUH<br />

(QJLQHHULQJ DQG .QRZOHGJH (QJLQHHULQJ 6(.( .<br />

McCarthy, D. & A. Korhonen (1998). Detecting Verbal<br />

Participation in Diathesis Alternations. In 3URFHHGLQJV<br />

RI WKH<br />

��<br />

$QQXDO 0HHWLQJ RI WKH $&/. Vol. 2, 1493-<br />

1495. Montreal, Canada.<br />

Miller, G. & R. Beckwith & C. Fellbaum & D. Gross &<br />

K. Miller (1993). )LYH 3DSHUV RQ :RUG1HW. CSL<br />

Report, Vol. 43. Cognitive Science Laboratory,<br />

Princeton University.<br />

PAROLE, German Resources (1998). Produced in LE 2-<br />

4017. Mannheim: Institut für deutsche Sprache.<br />

Peters, W. & I. Peters & P. Vossen (1998). The Reduction<br />

<strong>of</strong> Semantic Ambiguity in Linguistic Resources. In<br />

3URFHHGLQJV RI WKH<br />

��<br />

,QWHUQDWLRQDO &RQIHUHQFH RQ<br />

/DQJXDJH 5HVRXUFHV DQG (YDOXDWLRQ. Granada, Spain.<br />

Resnik, P.S. (1993). 6HOHFWLRQ DQG ,QIRUPDWLRQ $ &ODVV<br />

%DVHG $SSURDFK WR /H[LFDO 5HODWLRQVKLSV. PhD thesis,<br />

University <strong>of</strong> Pennsylvania.<br />

Vossen, P. (1999). EuroWordNet. Building a Multilingual<br />

Database with Lexical-Semantic Networks for the<br />

European Languages. In 3URFHHGLQJV RI (852/$1 ,<br />

262-272). 4 th European Summer School on Human<br />

Language Technology. Iasi, Romania. July, 1999.<br />

Wagner, A. & C. Kunze (1999). Anwendungsperspektiven<br />

des <strong>GermaNet</strong>, eines lexikalisch-semantischen Netzes<br />

für das Deutsche. To appear in: Schröder, B. et al.<br />

(eds.), 3UREOHPH XQG 3HUVSHNWLYHQ FRPSXWHUJHVW W]WHU<br />

/H[LNRJUDSKLH Tübingen:Niemeyer.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!