presentation-wordnet..
presentation-wordnet..
presentation-wordnet..
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DanNet<br />
From Dictionary<br />
to Wordnet<br />
Jörg Asmussen<br />
Society for Danish Language and Literature, DSL, Copenhagen<br />
Bolette Sandford Pedersen<br />
Centre for Language Technology, CST, University of Copenhagen<br />
Lars Trap-Jensen<br />
Society for Danish Language and Literature, DSL, Copenhagen
Outline<br />
1. Introduction LTJ, 2 min.<br />
2. Characteristics of the DDO LTJ, 5 min.<br />
3. Building DanNet BSP, 8 min.<br />
4. Extraction of differentia info JA, 7 min.<br />
5. Conclusions JA, 2 min
•<br />
•<br />
•<br />
DanNet<br />
Lexical-semantic <strong>wordnet</strong> for Danish<br />
Joint project<br />
•<br />
•<br />
Society for Danish Language and<br />
Literature<br />
Centre for Language Technology,<br />
University of Copenhagen<br />
4 years (2005 – 2008), ~ 400,000 €
•<br />
•<br />
Limited resources<br />
Adapt an existing <strong>wordnet</strong>?<br />
or<br />
Reuse other lexical-semantic resources:<br />
• SIMPLE-DK<br />
• Den Danske Ordbog, DDO
1. Introduction<br />
Outline<br />
2. Characteristics of the DDO<br />
3. Building DanNet<br />
4. Extraction of differentia info from definitons<br />
5. Conclusions
Den Danske Ordbog<br />
• Published by DSL 2003–5<br />
• Corpus-based, DDOC<br />
• 60,000 entries<br />
• Spelling, morphology,<br />
pronunciation, meaning,<br />
collocations,<br />
fixed phrases, syntax,<br />
usage, word formation,<br />
etymology
•<br />
•<br />
•<br />
•<br />
Den Danske Ordbog<br />
Words edited in related groups<br />
Machine readable<br />
Fine-grained microstructure<br />
100,000 definitions
Semantic description
Semantic description<br />
Systematic<br />
domain info<br />
→ concerns relation
Semantic description<br />
Sense<br />
definition<br />
→ relevant info<br />
„manually“ extracted
Semantic description<br />
Hyperonym
Semantic description<br />
Sense<br />
relations,<br />
i.e. synonyms
Semantic description<br />
Collocational<br />
information
Semantic description<br />
Authentic<br />
example
Semantic description
Definitions in the DDO<br />
Definition scheme:<br />
•<br />
•<br />
Genus proximum – closest hyperonym:<br />
apparat ‚technical device‘<br />
Differentia specifica – distinctive feature:<br />
remaining part of the definition
1. Introduction<br />
Outline<br />
2. Characteristics of the DDO<br />
3. Building DanNet<br />
4. Extraction of differentia info from definitons<br />
5. Conclusions
•<br />
•<br />
•<br />
Building DanNet<br />
Extract definitions and genus specifications<br />
Include them in the DanNet tool<br />
Use it for domain-wise development of data:<br />
1. Homonymy and polysemy<br />
2. Establishing synsets<br />
3. Adjusting the hierarchical structure
Homonymy & polysemy<br />
celle ‚cell‘ is genus proximum of<br />
•<br />
•<br />
gærcelle ,yeast cell‘<br />
fængselscelle ‚prison cell‘<br />
Convert lexical expressions into concepts:<br />
•<br />
•<br />
celle-1 ‚part of living organism‘<br />
celle-2 ,small room‘
informatik<br />
‚informatics‘<br />
Establishing synsets<br />
lære<br />
‚studies‘<br />
bromatologi<br />
‚nutrition science‘<br />
fag<br />
‚subject‘<br />
samfundsfag<br />
‚social studies‘<br />
videnskab<br />
‚science‘<br />
datalogi<br />
‚computer science‘
informatik<br />
‚informatics‘<br />
Establishing synsets<br />
lære<br />
‚studies‘<br />
bromatologi<br />
‚nutrition science‘<br />
fag<br />
‚subject‘<br />
samfundsfag<br />
‚social studies‘<br />
One synset<br />
videnskab<br />
‚science‘<br />
datalogi<br />
‚computer science‘
Building the hierarchy<br />
Hyponymy is generally defined as<br />
•<br />
X is a Y<br />
Taxonymy is a subtype of this:<br />
•<br />
X is a kind/type of Y<br />
Cf. Cruse, 1991 and 2002
Example: Hyponymy?<br />
kirsebærtræ<br />
‚cherry tree‘<br />
træ<br />
‚tree‘<br />
birketræ<br />
‚birch‘<br />
vejtræ<br />
‚roadside tree‘
Example: Hyponymy?<br />
kirsebærtræ<br />
‚cherry tree‘<br />
træ<br />
‚tree‘<br />
birketræ<br />
‚birch‘<br />
vejtræ<br />
‚roadside tree‘<br />
„Orthogonal“<br />
Hyponymy
Building the hierarchy<br />
TOP<br />
genstand ‚object‘<br />
møbel ‚furniture‘<br />
siddemøbel ‚sitting furniture‘<br />
stol ‚chair‘
Building the hierarchy<br />
TOP<br />
genstand ‚object‘<br />
møbel ‚furniture‘<br />
siddemøbel ‚sitting furniture‘<br />
stol ‚chair‘<br />
indbo/bohave ‚household effects‘
Building the hierarchy<br />
TOP<br />
genstand ‚object‘<br />
møbel ‚furniture‘<br />
siddemøbel ‚sitting furniture‘<br />
stol ‚chair‘<br />
indbo/bohave ‚household effects‘
Definition composition<br />
• Genus selection – a conscious process<br />
• Differentia:<br />
• No editorial specifications, i.e. no fixed<br />
definition vocabulary nor syntax<br />
•<br />
Consequences for DanNet:<br />
•<br />
•<br />
Complicates computational exploitation<br />
Semantic relations are coded manually
•<br />
•<br />
Coding relations<br />
What is done manually:<br />
•<br />
•<br />
No semantic info other than that of DDO<br />
Reduction of semantic info<br />
What is done automatically:<br />
•<br />
Inheritance of relations from hyperonyms
1. Introduction<br />
Outline<br />
2. Characteristics of the DDO<br />
3. Building DanNet<br />
4. Extraction of differentia info from definitons<br />
5. Conclusions
Extraction of telic role<br />
fjernsyn ‚tv set‘<br />
‚box-shaped device that can receive tv signals<br />
and transform them into animated pictures<br />
on a screen and accompanying sound in the<br />
speakers of the device‘
Extraction of telic role<br />
fjernsyn ‚tv set‘<br />
genus<br />
expression<br />
‚box-shaped device that can receive tv signals<br />
and transform them into animated pictures<br />
on a screen and accompanying sound in the<br />
speakers of the device‘
Extraction of telic role<br />
fjernsyn ‚tv set‘<br />
genus<br />
expression<br />
‚box-shaped device that can receive tv signals<br />
and transform them into animated pictures<br />
on a screen and accompanying sound in the<br />
speakers of the device‘<br />
Telic role:<br />
VPs headed by ‚can‘
Extraction of telic role<br />
fjernsyn ‚tv set‘<br />
genus<br />
expression<br />
‚box-shaped device that can receive tv signals<br />
and transform them into animated pictures<br />
on a screen and accompanying sound in the<br />
speakers of the device‘<br />
Telic role:<br />
VPs headed by ‚can‘
Hypothesis
Hypothesis<br />
‣ VPs in a relative clause which are headed by<br />
kan ‚can‘ specify the telic role (i.e. the<br />
for_purpose_of relation) of the definiendum
Hypothesis<br />
Corpus query<br />
‣ VPs in Find a relative all definitions clause with which genus are apparat headed by<br />
kan ‚can‘ specify followed the by telic der role or som (i.e. the<br />
for_purpose_of relation) followed by of kan the definiendum<br />
followed by a word ending in e
Results of corpus query
Results of corpus query<br />
query<br />
VP<br />
heads denoting<br />
telic role<br />
dictionary<br />
entries
Results of corpus query<br />
query<br />
VP<br />
heads denoting<br />
telic role<br />
Only 26 occurrences<br />
of this pattern – but 203<br />
dictionary<br />
entries<br />
apparat definitions
Why this bad coverage?
Why this bad coverage?<br />
1. Definitions where the pattern contains<br />
interposed material are not captured
Why this bad coverage?<br />
1. Definitions where the pattern contains<br />
interposed material are not captured<br />
2. Other stuctural patterns indicating a<br />
for_purpose_of relation than that one given in<br />
our hypothesis
Further patterns<br />
1. GE that can VP-inf<br />
2. GE that is used for to VP-inf with<br />
3. GE for to VP-inf with/on/in<br />
4. GE that VP-fin<br />
5. GE for NP<br />
6. GE that is specially designed for to VP-inf
1. GE that can VP-inf<br />
2. GE that is used for to VP-inf with<br />
3. GE for to VP-inf with/on/in<br />
4. GE that VP-fin<br />
5. GE for NP<br />
Further patterns<br />
head<br />
6. GE that is specially designed for to VP-inf<br />
for_purpose_of
1. GE that can VP-inf<br />
2. GE that is used for to VP-inf with<br />
3. GE for to VP-inf with/on/in<br />
4. GE that VP-fin<br />
5. GE for NP<br />
Further patterns<br />
head<br />
These patterns<br />
6. GE that is specially designed for to VP-inf<br />
for_purpose_of<br />
capture 70% of the apparat<br />
definitions
A statistical approach
•<br />
A statistical approach<br />
Frequency list of types in definitions with<br />
genus apparat
•<br />
A statistical approach<br />
Frequency list of types in definitions with<br />
genus apparat<br />
compared with
•<br />
•<br />
A statistical approach<br />
Frequency list of types in definitions with<br />
genus apparat<br />
compared with<br />
frequency list of types in all definitions
•<br />
•<br />
A statistical approach<br />
Frequency list of types in definitions with<br />
genus apparat<br />
compared with<br />
frequency list of types in all definitions<br />
using a statistical test (e.g. log likelihood)
•<br />
•<br />
A statistical approach<br />
Frequency list of types in definitions with<br />
genus apparat<br />
compared with<br />
frequency list of types in all definitions<br />
using a statistical test (e.g. log likelihood)<br />
‣ Salient types are listed for investigation and<br />
may give hints on semantic relations
•<br />
•<br />
•<br />
•<br />
•<br />
•<br />
Some salient types<br />
afspille ‚to play back‘<br />
afspilning ‚play back‘<br />
måle ,measure‘<br />
måler ,measuring tool‘<br />
måling ,gauging‘<br />
målinger ,measurements‘
•<br />
•<br />
•<br />
•<br />
•<br />
•<br />
Some salient types<br />
afspille ‚to play back‘<br />
afspilning ‚play back‘<br />
måle ,measure‘<br />
måler ,measuring tool‘<br />
måling ,gauging‘<br />
målinger ,measurements‘<br />
grammofon,<br />
cd-afspiller, afspiller, sequencer,<br />
diktafon<br />
kassettespiller,<br />
hjemmevideo, kassettebåndoptager,<br />
båndoptager<br />
stroboskop,<br />
måler, timer, løgnedetektor, ekkolod<br />
gasmåler,<br />
speedometer, omdrejningstæller,<br />
benzinmåler, fotofælde<br />
elmåler,<br />
trykmåler, luxmeter, spirometer,<br />
gyrometer, alkometer, newtonmeter,<br />
magnetometer, instrument,<br />
måleinstrument, kalorimeter<br />
radiosonde, satellit, fartskriver
Automatic extraction?
Automatic extraction?<br />
Basically NO...<br />
Developing reliant methods is<br />
too expensive!
Automatic extraction?<br />
•<br />
Structural and lexical properties of<br />
definitions differ considerably
Automatic extraction?<br />
•<br />
‣<br />
Structural and lexical properties of<br />
definitions differ considerably<br />
Difficult to automatically extract semantic<br />
relations from definitions
Automatic extraction?<br />
•<br />
‣<br />
‣<br />
Structural and lexical properties of<br />
definitions differ considerably<br />
Difficult to automatically extract semantic<br />
relations from definitions<br />
Concordances and lists of salient definition<br />
types may help the editor
Automatic extraction?<br />
•<br />
‣<br />
‣<br />
‣<br />
Structural and lexical properties of<br />
definitions differ considerably<br />
Difficult to automatically extract semantic<br />
relations from definitions<br />
Concordances and lists of salient definition<br />
types may help the editor<br />
But the DanNet editor still has to do the<br />
core job of analysing dictionary definitions
1. Introduction<br />
Outline<br />
2. Characteristics of the DDO<br />
3. Building DanNet<br />
4. Extraction of differentia info from definitons<br />
5. Conclusions
Conclusion<br />
Reusing the DDO
Cheap<br />
Expensive<br />
Conclusion<br />
Reusing the DDO
Cheap<br />
Expensive<br />
Conclusion<br />
Reusing the DDO<br />
Semi-automatic exploitation of the dictionary<br />
structure<br />
•<br />
•<br />
hyponymy structure<br />
synonym/antonym info
Cheap<br />
Expensive<br />
Conclusion<br />
Reusing the DDO<br />
Semi-automatic exploitation of the dictionary<br />
structure<br />
•<br />
•<br />
hyponymy structure<br />
synonym/antonym info<br />
Automatic exploitation of definitions proper<br />
to find other semantic relations
Cheap<br />
Expensive<br />
Conclusion<br />
Reusing the DDO<br />
Semi-automatic exploitation of the dictionary<br />
structure<br />
•<br />
•<br />
hyponymy structure<br />
synonym/antonym info<br />
Automatic exploitation of definitions proper<br />
to find other semantic relations
Conclusion<br />
The DanNet approach
Cheap<br />
Expensive<br />
Conclusion<br />
The DanNet approach
Cheap<br />
Expensive<br />
Conclusion<br />
The DanNet approach<br />
Translation/expansion of existing WNs?<br />
• Better coherence with other WNs<br />
• Linguistic bias
Cheap<br />
Expensive<br />
Conclusion<br />
The DanNet approach<br />
Translation/expansion of existing WNs?<br />
• Better coherence with other WNs<br />
• Linguistic bias<br />
Reusing/merging language resources?<br />
• More loyal to the specific language<br />
• Expensive, unless based on an existing<br />
resource, i.e. a dictionary
Cheap<br />
Expensive<br />
Conclusion<br />
The DanNet approach<br />
Translation/expansion of existing WNs?<br />
• Better coherence with other WNs<br />
• Linguistic bias<br />
Reusing/merging language resources?<br />
• More loyal to the specific language<br />
• Expensive, unless based on an existing<br />
resource, i.e. a dictionary