Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1.2. Approach 5<br />
• They are not built for Portuguese from scratch, and thus have to deal with<br />
translation issues, and include problems as lexical gaps;<br />
• They do not handle word senses, which might lead to inconsistencies regarding<br />
lexical ambiguity.<br />
Looking at this scenario, we set our goal to <strong>the</strong> development <strong>of</strong> computational<br />
tools for acquiring, structuring and integrating lexical-semantic<br />
knowledge from text. Although some <strong>of</strong> <strong>the</strong>se tools can be used independently,<br />
<strong>the</strong>ir development had in mind <strong>the</strong> exploitation <strong>of</strong> Portuguese resources and <strong>the</strong> aim<br />
<strong>of</strong> creating a new lexical ontology for Portuguese, where <strong>the</strong> aforementioned<br />
limitations were minimised. Consequently, <strong>the</strong> resulting resource would be:<br />
• Public domain and thus free for being used by anyone, both in a research<br />
or in a commercial setting. We believe this is <strong>the</strong> best way for <strong>the</strong> resource to<br />
play its role in helping to advance <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> Portuguese NLP. Fur<strong>the</strong>rmore,<br />
a bigger community <strong>of</strong> users tends to provide important feedback,<br />
useful for improving <strong>the</strong> resource.<br />
• Created automatically, which would be done by exploiting textual resources<br />
and o<strong>the</strong>r public LKBs, all created from scratch for one or more variants<br />
<strong>of</strong> Portuguese. An automatic construction enables <strong>the</strong> creation <strong>of</strong> larger and<br />
broader resources, in a trade-<strong>of</strong>f for lower reliability, but still acceptable for<br />
most tasks.<br />
• Structured according to <strong>the</strong> wordnet model. This option relied on <strong>the</strong> great<br />
acceptance <strong>of</strong> this model and on <strong>the</strong> wide range <strong>of</strong> algorithms that work over<br />
this kind <strong>of</strong> structure to achieve various NLP tasks.<br />
1.2 Approach<br />
Our flexible approach for <strong>the</strong> acquisition, organisation and integration <strong>of</strong><br />
lexical-semantic knowledge involves three main automatic steps. Each step is<br />
independent <strong>of</strong> each o<strong>the</strong>r and can be used for <strong>the</strong> achievement <strong>of</strong> simpler tasks. Alternatively,<br />
<strong>the</strong>ir combination enables <strong>the</strong> integration <strong>of</strong> lexical-semantic knowledge<br />
from different heterogeneous sources and results in a wordnet-like ontology. The<br />
three steps are briefly described as follows:<br />
1. Extraction: instances <strong>of</strong> semantic relations, held between lexical items, are<br />
automatically extracted from text. As long as <strong>the</strong> extracted instances are<br />
represented as triples (two items connected by a predicate), <strong>the</strong> extraction<br />
techniques used in this step do not affect <strong>the</strong> following steps. In <strong>the</strong> specific<br />
case <strong>of</strong> our work, we followed a pattern based extraction on dictionary<br />
definitions.<br />
2. Thesaurus enrichment and clustering: if <strong>the</strong>re is a conceptual base with<br />
synsets for <strong>the</strong> target language, its synsets are augmented with <strong>the</strong> extracted<br />
synonymy relations. For this purpose, <strong>the</strong> network established by all extracted<br />
synonymy instances (synpairs) is exploited for computing <strong>the</strong> similarities between<br />
each synset and synpair. Both elements <strong>of</strong> a synpair are <strong>the</strong>n added to<br />
<strong>the</strong>ir most similar synset. As for synpairs with two lexical items not covered