Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
158 Chapter 9. Final discussion<br />
3. Moving from term-based to synset-based semantic relations, without<br />
using <strong>the</strong> extraction context (chapter 7).<br />
Also, even though <strong>the</strong> procedure for extracting semantic relations from dictionaries<br />
cannot be seen as novel, in chapter 4 <strong>of</strong> this <strong>the</strong>sis we have presented work on<br />
<strong>the</strong> comparison <strong>of</strong> <strong>the</strong> structure and contents in different dictionaries <strong>of</strong><br />
Portuguese. For instance, we have shown that many regularities are kept across<br />
<strong>the</strong> definitions <strong>of</strong> each dictionary, which enabled us to use <strong>the</strong> same grammars for<br />
extracting information from all <strong>the</strong> three dictionaries.<br />
Starting with a set <strong>of</strong> extracted semantic relations, and combining <strong>the</strong> aforementioned<br />
procedures in <strong>the</strong> appearing order, we proposed ECO, a flexible approach<br />
for creating a wordnet-like lexical ontology automatically from text. ECO<br />
was used for Portuguese but, considering that different methods can be used for <strong>the</strong><br />
relation extraction step, it is language independent.<br />
During this work, each <strong>of</strong> <strong>the</strong> previous procedures was used in <strong>the</strong> construction<br />
<strong>of</strong> several lexical-semantic resources. These resources, listed below, are public domain<br />
and may be used toge<strong>the</strong>r with applications that we hope will contribute for<br />
advancing <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> <strong>the</strong> computational processing <strong>of</strong> Portuguese:<br />
• CARTÃO: <strong>the</strong> largest term-based lexical-semantic network for Portuguese,<br />
larger that PAPEL, which it includes toge<strong>the</strong>r with relations extracted from<br />
two o<strong>the</strong>r dictionaries (chapter 4).<br />
• CLIP: <strong>the</strong> first fuzzy <strong>the</strong>saurus for Portuguese, completely extracted from<br />
dictionaries (chapter 5).<br />
• TRIP: <strong>the</strong> largest synset-based <strong>the</strong>saurus for Portuguese, larger than TeP,<br />
which it includes toge<strong>the</strong>r with synonymy information acquired automatically<br />
from dictionaries (chapter 6).<br />
• <strong>Onto</strong>.<strong>PT</strong>: a new wordnet-like lexical ontology for Portuguese, extracted<br />
automatically from textual resources that covers more than 100,000 concepts<br />
(represented as synsets) and more than 170,000 semantic relations (chapter<br />
8). Currently, <strong>Onto</strong>.<strong>PT</strong> contains information from five lexical resources,<br />
but <strong>the</strong> ECO approach enables <strong>the</strong> future integration <strong>of</strong> knowledge from o<strong>the</strong>r<br />
sources, and consequently its future expansion. It is an addition and/or an alternative<br />
to existing broad-coverage lexical-semantic resources for Portuguese.<br />
The aforementioned contributions are described in <strong>the</strong> following scientific publications,<br />
presented in national and international events, including some highly selective<br />
ones. Toge<strong>the</strong>r with <strong>the</strong> description <strong>of</strong> <strong>the</strong> publication venue, we present, when<br />
available, its acceptance rate and ERA ranking 1 :<br />
• <strong>Automatic</strong> extraction <strong>of</strong> semantic relations from Portuguese definitions in<br />
collaborativelly-created resources – Wikipedia, first, and Wiktionary, second:<br />
– Gonçalo Oliveira, H., Costa, H., and Gomes, P. (2010a). Extracção de conhecimento<br />
léxico-semântico a partir de resumos da Wikipédia. In Actas do II<br />
Simpósio de Informática, INFORUM 2010, pages 537–548, Braga, Portugal.<br />
Universidade do Minho (40% acceptance rate)<br />
1 Conference ranking by <strong>the</strong> Excellence in Research for Australia, see http://core.edu.au/<br />
index.php/categories/conference\%20rankings/1 (August 2012)