24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

158 Chapter 9. Final discussion<br />

3. Moving from term-based to synset-based semantic relations, without<br />

using <strong>the</strong> extraction context (chapter 7).<br />

Also, even though <strong>the</strong> procedure for extracting semantic relations from dictionaries<br />

cannot be seen as novel, in chapter 4 <strong>of</strong> this <strong>the</strong>sis we have presented work on<br />

<strong>the</strong> comparison <strong>of</strong> <strong>the</strong> structure and contents in different dictionaries <strong>of</strong><br />

Portuguese. For instance, we have shown that many regularities are kept across<br />

<strong>the</strong> definitions <strong>of</strong> each dictionary, which enabled us to use <strong>the</strong> same grammars for<br />

extracting information from all <strong>the</strong> three dictionaries.<br />

Starting with a set <strong>of</strong> extracted semantic relations, and combining <strong>the</strong> aforementioned<br />

procedures in <strong>the</strong> appearing order, we proposed ECO, a flexible approach<br />

for creating a wordnet-like lexical ontology automatically from text. ECO<br />

was used for Portuguese but, considering that different methods can be used for <strong>the</strong><br />

relation extraction step, it is language independent.<br />

During this work, each <strong>of</strong> <strong>the</strong> previous procedures was used in <strong>the</strong> construction<br />

<strong>of</strong> several lexical-semantic resources. These resources, listed below, are public domain<br />

and may be used toge<strong>the</strong>r with applications that we hope will contribute for<br />

advancing <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art <strong>of</strong> <strong>the</strong> computational processing <strong>of</strong> Portuguese:<br />

• CARTÃO: <strong>the</strong> largest term-based lexical-semantic network for Portuguese,<br />

larger that PAPEL, which it includes toge<strong>the</strong>r with relations extracted from<br />

two o<strong>the</strong>r dictionaries (chapter 4).<br />

• CLIP: <strong>the</strong> first fuzzy <strong>the</strong>saurus for Portuguese, completely extracted from<br />

dictionaries (chapter 5).<br />

• TRIP: <strong>the</strong> largest synset-based <strong>the</strong>saurus for Portuguese, larger than TeP,<br />

which it includes toge<strong>the</strong>r with synonymy information acquired automatically<br />

from dictionaries (chapter 6).<br />

• <strong>Onto</strong>.<strong>PT</strong>: a new wordnet-like lexical ontology for Portuguese, extracted<br />

automatically from textual resources that covers more than 100,000 concepts<br />

(represented as synsets) and more than 170,000 semantic relations (chapter<br />

8). Currently, <strong>Onto</strong>.<strong>PT</strong> contains information from five lexical resources,<br />

but <strong>the</strong> ECO approach enables <strong>the</strong> future integration <strong>of</strong> knowledge from o<strong>the</strong>r<br />

sources, and consequently its future expansion. It is an addition and/or an alternative<br />

to existing broad-coverage lexical-semantic resources for Portuguese.<br />

The aforementioned contributions are described in <strong>the</strong> following scientific publications,<br />

presented in national and international events, including some highly selective<br />

ones. Toge<strong>the</strong>r with <strong>the</strong> description <strong>of</strong> <strong>the</strong> publication venue, we present, when<br />

available, its acceptance rate and ERA ranking 1 :<br />

• <strong>Automatic</strong> extraction <strong>of</strong> semantic relations from Portuguese definitions in<br />

collaborativelly-created resources – Wikipedia, first, and Wiktionary, second:<br />

– Gonçalo Oliveira, H., Costa, H., and Gomes, P. (2010a). Extracção de conhecimento<br />

léxico-semântico a partir de resumos da Wikipédia. In Actas do II<br />

Simpósio de Informática, INFORUM 2010, pages 537–548, Braga, Portugal.<br />

Universidade do Minho (40% acceptance rate)<br />

1 Conference ranking by <strong>the</strong> Excellence in Research for Australia, see http://core.edu.au/<br />

index.php/categories/conference\%20rankings/1 (August 2012)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!