Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 Chapter 1. Introduction<br />
by <strong>the</strong> conceptual base, <strong>the</strong>y establish a smaller synonymy network. This network<br />
is finally exploited for <strong>the</strong> identification <strong>of</strong> word clusters, which can be<br />
seen as new synsets.<br />
3. <strong>Onto</strong>logisation: <strong>the</strong> lexical items in <strong>the</strong> arguments <strong>of</strong> <strong>the</strong> non-synonymy relation<br />
instances are attached to suitable synsets. Once again, this is achieved<br />
by exploiting <strong>the</strong> network established by all extracted relations, in order to,<br />
given a relation instance, select <strong>the</strong> most similar pair <strong>of</strong> candidate synsets.<br />
As <strong>the</strong> resulting resource is structured in synsets and semantic relations between<br />
<strong>the</strong>m, it can be seen as a wordnet. Given <strong>the</strong> three aforementioned steps, this<br />
approach for creating wordnets automatically was baptised as ECO, which stands<br />
for Extraction, Clustering and <strong>Onto</strong>logisation.<br />
1.3 Contributions<br />
Given our main goal, <strong>Onto</strong>.<strong>PT</strong> can be seen as <strong>the</strong> main contribution <strong>of</strong> this research.<br />
<strong>Onto</strong>.<strong>PT</strong> is a wordnet-like lexical ontology for Portuguese, whose current<br />
version integrates lexical-semantic knowledge from five lexical resources, more precisely<br />
three dictionaries and two <strong>the</strong>sauri. Actually, after noticing that most <strong>of</strong><br />
<strong>the</strong> Portuguese lexical resources were somehow complementary (Santos et al., 2010;<br />
Teixeira et al., 2010), we integrated in <strong>Onto</strong>.<strong>PT</strong> those that were public.<br />
The current version <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong> contains more than 100,000 synsets and more<br />
than 170,000 labelled connections, which represent semantic relations. This new<br />
resource is a public alternative to existing Portuguese LKBs and can be used as a<br />
wordnet. This means that, for Portuguese, <strong>Onto</strong>.<strong>PT</strong> can be used in most NLP tasks<br />
that exploit <strong>the</strong> structure <strong>of</strong> a wordnet for achieving <strong>the</strong>ir goal, except for those that<br />
use <strong>the</strong> synset glosses, unavailable in <strong>Onto</strong>.<strong>PT</strong>.<br />
But <strong>Onto</strong>.<strong>PT</strong> is not a static resource. It is created in a three step flexible<br />
approach, ECO, briefly described in <strong>the</strong> previous section. ECO enables <strong>the</strong> integration<br />
<strong>of</strong> lexical-semantic knowledge from different heterogeneous sources, and<br />
can be used to create different instances <strong>of</strong> <strong>the</strong> resource, using different parameters.<br />
Moreover, although applied only to <strong>the</strong> creation <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, we propose ECO as an<br />
approach that may be adopted in <strong>the</strong> creation or enrichment <strong>of</strong> wordnets in o<strong>the</strong>r<br />
languages. It is thus ano<strong>the</strong>r important contribution <strong>of</strong> this <strong>the</strong>sis.<br />
Each step <strong>of</strong> ECO can also be individually seen as contribution to <strong>the</strong> fields<br />
<strong>of</strong> information extraction and automatic creation <strong>of</strong> wordnets. These steps include<br />
procedures for:<br />
1. Enriching an existing <strong>the</strong>saurus with new synonymys.<br />
2. Discovering synsets (or fuzzy synsets) from dictionary definitions.<br />
3. Moving from term-based to synset-based semantic relations, without accessing<br />
<strong>the</strong> extraction context.<br />
On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> procedure for extracting semantic relations from dictionaries<br />
cannot be seen as novel. Still, we have compared <strong>the</strong> structure and contents<br />
in different dictionaries <strong>of</strong> Portuguese, which led to <strong>the</strong> conclusion that many regularities<br />
are kept across <strong>the</strong> definitions <strong>of</strong> each dictionary. This comparison,