24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2. <strong>Lexical</strong>-Semantic Information Extraction 39<br />

<strong>of</strong> a language. They are substantial sources <strong>of</strong> general lexical knowledge (Briscoe,<br />

1991) and “authorities” <strong>of</strong> word senses (Kilgarriff, 1997), which are described in<br />

textual definitions, written by lexicographers, <strong>the</strong> experts on <strong>the</strong> field.<br />

Despite several automatic attempts to <strong>the</strong> creation <strong>of</strong> a broad-coverage LKB, for<br />

English, Princeton WordNet, a manual effort, ended up to be <strong>the</strong> leading resource<br />

<strong>of</strong> this kind (Sampson, 2000). As discussed in section 3.1.1, <strong>the</strong> existence <strong>of</strong> a<br />

wordnet in one language has a positive impact in <strong>the</strong> development <strong>of</strong> NLP tools for<br />

that language. Never<strong>the</strong>less, despite <strong>the</strong> wide acceptance <strong>of</strong> WordNet, research on<br />

LSIE continues, not only from dictionaries, but especially from corpora and o<strong>the</strong>r<br />

unstructured resources, whe<strong>the</strong>r it is for <strong>the</strong> enrichment <strong>of</strong> WordNet (see section 3.3)<br />

or for <strong>the</strong> creation <strong>of</strong> alternative LKBs, including LKBs in non-English languages.<br />

In this section, we start with a brief chronology <strong>of</strong> LSIE from dictionaries. Then,<br />

we present work on LSIE from corpora and IE from o<strong>the</strong>r unstructured textual<br />

resources.<br />

3.2.1 Information Extraction from Electronic Dictionaries<br />

In <strong>the</strong> beginning<br />

During <strong>the</strong> 1970s, and throughout <strong>the</strong> 1980s, electronic dictionaries started to be<br />

<strong>the</strong> target <strong>of</strong> empirical studies (e.g. Calzolari et al. (1973); Amsler (1980); Michiels<br />

et al. (1980)), having in mind <strong>the</strong>ir exploitation in <strong>the</strong> automatic construction <strong>of</strong><br />

a LKB. This kind <strong>of</strong> knowledge base would ease <strong>the</strong> access to morphological and<br />

semantic information about <strong>the</strong> defined words (Calzolari et al., 1973), which would<br />

<strong>the</strong>n be very useful in <strong>the</strong> achievement <strong>of</strong> NLP tasks.<br />

These earlier works confirmed that <strong>the</strong> vocabulary used in dictionaries is limited,<br />

which makes <strong>the</strong>m easier to process for obtaining semantic or syntactic relations<br />

(Michiels et al., 1980). They concluded that <strong>the</strong> textual definitions are <strong>of</strong>ten<br />

structured on a genus and a differentia (Amsler, 1980):<br />

• The genus identifies <strong>the</strong> superordinate concept <strong>of</strong> <strong>the</strong> definiendum – <strong>the</strong><br />

definiendum is an instance or a “type <strong>of</strong>” <strong>the</strong> genus, which means <strong>the</strong>re is<br />

a hyponymy relation between <strong>the</strong> former and <strong>the</strong> latter.<br />

• The differentia contains <strong>the</strong> specific properties for distinguishing <strong>the</strong> definiendum<br />

from o<strong>the</strong>r instances <strong>of</strong> <strong>the</strong> superordinate concept.<br />

Having in mind that this kind <strong>of</strong> structure is suitable for being exploited in<br />

<strong>the</strong> automatic acquisition <strong>of</strong> taxonomies, Amsler (1981) proposes a taxonomy for<br />

English nouns and verbs. The extracted structures, dubbed tangled hierarchies,<br />

were created after <strong>the</strong> analysis <strong>of</strong> dictionary definitions and manual disambiguation<br />

<strong>of</strong> <strong>the</strong> head word <strong>of</strong> each definition. Amsler (1981) concluded that dictionaries<br />

clearly represent two taxonomic relations: is-a (hypernymy) and is-part (part-<strong>of</strong>).<br />

Calzolari (1984) suggests a set <strong>of</strong> frequent patterns in dictionary definitions,<br />

and examines <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> hyponymy and “restriction” relations. She<br />

claims that hyponymy is <strong>the</strong> most important and evident relation in <strong>the</strong> lexicon and<br />

confirms it can be easily extracted from a dictionary, after identifying <strong>the</strong> genus and<br />

<strong>the</strong> differentia.<br />

Markowitz et al. (1986) identified a set <strong>of</strong> textual patterns that occur in <strong>the</strong> beginning<br />

<strong>of</strong> <strong>the</strong> definitions <strong>of</strong> a dictionary. Those patterns are used to denote relations

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!