Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.4. Outline <strong>of</strong> <strong>the</strong> <strong>the</strong>sis 7<br />
which can be seen as ano<strong>the</strong>r contribution <strong>of</strong> this <strong>the</strong>sis, enabled us to use <strong>the</strong> same<br />
grammars for extracting information from three dictionaries.<br />
Toge<strong>the</strong>r with <strong>Onto</strong>.<strong>PT</strong>, o<strong>the</strong>r lexical-semantic resources were developed by using<br />
each <strong>of</strong> <strong>the</strong> ECO steps independently. These resources, listed below, contribute for<br />
advancing <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> Portuguese LKBs:<br />
• CARTÃO, <strong>the</strong> largest term-based lexical-semantic network for Portuguese;<br />
• CLIP, as far as we know, <strong>the</strong> first broad-coverage fuzzy <strong>the</strong>saurus for Portuguese;<br />
• TRIP, <strong>the</strong> largest public synset-based <strong>the</strong>saurus for Portuguese.<br />
We should add that most <strong>of</strong> <strong>the</strong> work performed during <strong>the</strong> course <strong>of</strong> this <strong>the</strong>sis is<br />
reported in a total <strong>of</strong> 14 scientific papers, presented and/or published in national and<br />
international venues, such as IJCAI, ECAI, EPIA or NLDB (see more in section 9.1).<br />
1.4 Outline <strong>of</strong> <strong>the</strong> <strong>the</strong>sis<br />
After two chapters on background knowledge and related work, each chapter <strong>of</strong> this<br />
<strong>the</strong>sis is focused on an automatic procedure that integrates <strong>the</strong> ECO approach and<br />
performs one step towards our final goal, <strong>the</strong> creation <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Besides describing<br />
each procedure, one or more experiments towards its validation are reported in each<br />
chapter. Before concluding, <strong>the</strong> last version <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, available while this <strong>the</strong>sis<br />
was written, is presented toge<strong>the</strong>r with examples <strong>of</strong> scenarios where it might be<br />
useful. In <strong>the</strong> end <strong>of</strong> <strong>the</strong> <strong>the</strong>sis, two appendices were included, namely: (A), with an<br />
extensive list <strong>of</strong> <strong>the</strong> semantic relations in <strong>Onto</strong>.<strong>PT</strong> and <strong>the</strong>ir description; and (B),<br />
which shows a rough manual mapping between <strong>the</strong> <strong>Onto</strong>.<strong>PT</strong> synsets and <strong>the</strong> core<br />
concepts <strong>of</strong> Princeton WordNet. We now describe each chapter, briefly:<br />
Chapter 2 introduces (mostly) <strong>the</strong>oretical background knowledge that supports<br />
this research. It starts with some remarks on lexical semantics, <strong>the</strong> subfield<br />
<strong>of</strong> semantics that deals with words and meanings. Then, different formalisms for<br />
representing lexical-semantic computational resources are described. Given that<br />
our work is related to <strong>the</strong> NLP field <strong>of</strong> information extraction, <strong>the</strong> last section is<br />
dedicated to this topic.<br />
Chapter 3 is about concrete work with some relation to our research. First, it<br />
describes well-known lexical knowledge bases for English, and also for Portuguese.<br />
Second, it presents work on information extraction from dictionaries and also from<br />
corpora. Third, work on <strong>the</strong> automatic enrichment or integration <strong>of</strong> existing knowledge<br />
bases is referred.<br />
Chapter 4 explains our work towards <strong>the</strong> acquisition <strong>of</strong> semantic relations<br />
from dictionaries. As many regularities are kept across definitions in different<br />
dictionaries, we reuse existing handcrafted grammars, made for <strong>the</strong> extraction <strong>of</strong><br />
semantic relations from one dictionary, for extracting relations from o<strong>the</strong>r dictionaries.<br />
This results in CARTÃO, a large lexical network for Portuguese that<br />
integrates knowledge from three different dictionaries.