Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.2. <strong>Lexical</strong>-Semantic Information Extraction 39<br />
<strong>of</strong> a language. They are substantial sources <strong>of</strong> general lexical knowledge (Briscoe,<br />
1991) and “authorities” <strong>of</strong> word senses (Kilgarriff, 1997), which are described in<br />
textual definitions, written by lexicographers, <strong>the</strong> experts on <strong>the</strong> field.<br />
Despite several automatic attempts to <strong>the</strong> creation <strong>of</strong> a broad-coverage LKB, for<br />
English, Princeton WordNet, a manual effort, ended up to be <strong>the</strong> leading resource<br />
<strong>of</strong> this kind (Sampson, 2000). As discussed in section 3.1.1, <strong>the</strong> existence <strong>of</strong> a<br />
wordnet in one language has a positive impact in <strong>the</strong> development <strong>of</strong> NLP tools for<br />
that language. Never<strong>the</strong>less, despite <strong>the</strong> wide acceptance <strong>of</strong> WordNet, research on<br />
LSIE continues, not only from dictionaries, but especially from corpora and o<strong>the</strong>r<br />
unstructured resources, whe<strong>the</strong>r it is for <strong>the</strong> enrichment <strong>of</strong> WordNet (see section 3.3)<br />
or for <strong>the</strong> creation <strong>of</strong> alternative LKBs, including LKBs in non-English languages.<br />
In this section, we start with a brief chronology <strong>of</strong> LSIE from dictionaries. Then,<br />
we present work on LSIE from corpora and IE from o<strong>the</strong>r unstructured textual<br />
resources.<br />
3.2.1 Information Extraction from Electronic Dictionaries<br />
In <strong>the</strong> beginning<br />
During <strong>the</strong> 1970s, and throughout <strong>the</strong> 1980s, electronic dictionaries started to be<br />
<strong>the</strong> target <strong>of</strong> empirical studies (e.g. Calzolari et al. (1973); Amsler (1980); Michiels<br />
et al. (1980)), having in mind <strong>the</strong>ir exploitation in <strong>the</strong> automatic construction <strong>of</strong><br />
a LKB. This kind <strong>of</strong> knowledge base would ease <strong>the</strong> access to morphological and<br />
semantic information about <strong>the</strong> defined words (Calzolari et al., 1973), which would<br />
<strong>the</strong>n be very useful in <strong>the</strong> achievement <strong>of</strong> NLP tasks.<br />
These earlier works confirmed that <strong>the</strong> vocabulary used in dictionaries is limited,<br />
which makes <strong>the</strong>m easier to process for obtaining semantic or syntactic relations<br />
(Michiels et al., 1980). They concluded that <strong>the</strong> textual definitions are <strong>of</strong>ten<br />
structured on a genus and a differentia (Amsler, 1980):<br />
• The genus identifies <strong>the</strong> superordinate concept <strong>of</strong> <strong>the</strong> definiendum – <strong>the</strong><br />
definiendum is an instance or a “type <strong>of</strong>” <strong>the</strong> genus, which means <strong>the</strong>re is<br />
a hyponymy relation between <strong>the</strong> former and <strong>the</strong> latter.<br />
• The differentia contains <strong>the</strong> specific properties for distinguishing <strong>the</strong> definiendum<br />
from o<strong>the</strong>r instances <strong>of</strong> <strong>the</strong> superordinate concept.<br />
Having in mind that this kind <strong>of</strong> structure is suitable for being exploited in<br />
<strong>the</strong> automatic acquisition <strong>of</strong> taxonomies, Amsler (1981) proposes a taxonomy for<br />
English nouns and verbs. The extracted structures, dubbed tangled hierarchies,<br />
were created after <strong>the</strong> analysis <strong>of</strong> dictionary definitions and manual disambiguation<br />
<strong>of</strong> <strong>the</strong> head word <strong>of</strong> each definition. Amsler (1981) concluded that dictionaries<br />
clearly represent two taxonomic relations: is-a (hypernymy) and is-part (part-<strong>of</strong>).<br />
Calzolari (1984) suggests a set <strong>of</strong> frequent patterns in dictionary definitions,<br />
and examines <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> hyponymy and “restriction” relations. She<br />
claims that hyponymy is <strong>the</strong> most important and evident relation in <strong>the</strong> lexicon and<br />
confirms it can be easily extracted from a dictionary, after identifying <strong>the</strong> genus and<br />
<strong>the</strong> differentia.<br />
Markowitz et al. (1986) identified a set <strong>of</strong> textual patterns that occur in <strong>the</strong> beginning<br />
<strong>of</strong> <strong>the</strong> definitions <strong>of</strong> a dictionary. Those patterns are used to denote relations