Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
114 Chapter 7. Moving from term-based to synset-based relations<br />
<strong>the</strong> tb-triples, DLP is proprietary, which means we cannot use <strong>the</strong> context <strong>of</strong> <strong>the</strong><br />
tb-triples <strong>of</strong> PAPEL. Not to refer that <strong>the</strong>re are several small definitions that do not<br />
provide enough context. Given this limitation, toge<strong>the</strong>r with <strong>the</strong> need to map <strong>of</strong>ten<br />
un-matching (Dolan, 1994; Peters et al., 1998) word sense definitions in different<br />
resources, and to define extraction contexts for different heterogeneous resources,<br />
we decided to ontologise without using <strong>the</strong> extraction context. This enables <strong>the</strong> creation<br />
<strong>of</strong> IE systems with two completely independent modules: (i) one responsible<br />
for extracting tb-triples; and (ii) ano<strong>the</strong>r for ontologising <strong>the</strong>m. In o<strong>the</strong>r words, <strong>the</strong><br />
second module attaches each term in a triple to a concept, represented, for instance,<br />
as a synset in a broad-coverage lexical ontology. We believe that this approach is an<br />
interesting way <strong>of</strong> coping with information sparsity, since it allows for <strong>the</strong> extraction<br />
<strong>of</strong> knowledge from different heterogeneous sources (e.g. dictionaries, encyclopedias,<br />
corpora), and provides a way to harmoniously integrate all <strong>the</strong> extracted information<br />
in a common knowledge base.<br />
In this chapter, we propose several algorithms for moving from tb-triples to<br />
synset-based relational triples (hereafter, sb-triples), taking advantage <strong>of</strong> nothing<br />
but <strong>the</strong> existing synsets and a lexical network with tb-triples. We start by presenting<br />
<strong>the</strong> algorithms and <strong>the</strong>n we describe how <strong>the</strong>y were evaluated and compared. The<br />
performance results supported <strong>the</strong> choice <strong>of</strong> this kind <strong>of</strong> algorithm in <strong>the</strong> creation<br />
<strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Also, given that <strong>the</strong> ontologising algorithms result in a set <strong>of</strong> synsets<br />
related among <strong>the</strong>mselves by semantic relations, <strong>the</strong>y are suitable for <strong>the</strong> last step<br />
<strong>of</strong> <strong>the</strong> ECO approach for creating wordnets. The core <strong>of</strong> this part <strong>of</strong> <strong>the</strong> work was<br />
originally reported in Gonçalo Oliveira and Gomes (2012a). Its earlier stages had<br />
been reported in Gonçalo Oliveira and Gomes (2011c).<br />
7.1 <strong>Onto</strong>logising algorithms<br />
Our work on ontologising semantic relations is similar to that presented by Pennacchiotti<br />
and Pantel (2006). The main difference is that <strong>the</strong> previous authors<br />
ontologise <strong>the</strong> semantic relations into WordNet, and exploit its structure, including<br />
synsets and existing synset-relations. We, on <strong>the</strong> o<strong>the</strong>r hand, had in mind to ontologise<br />
in a synset-base without synset-relations (TeP), so we had to find alternatives,<br />
such as exploring all <strong>the</strong> extracted information.<br />
The goal <strong>of</strong> <strong>the</strong> proposed algorithms is to ontologise tb-triples, {a R b}, in <strong>the</strong><br />
synsets <strong>of</strong> a <strong>the</strong>saurus T . Instead <strong>of</strong> considering <strong>the</strong> context where <strong>the</strong> triples were<br />
extracted from, or <strong>the</strong> synset glosses, <strong>the</strong>y exploit <strong>the</strong> information in a given lexical<br />
network N to select <strong>the</strong> best candidate synsets. A lexical network is established<br />
by a set <strong>of</strong> tb-triples, and is defined as a graph, N = (V, E), with |V | nodes and<br />
|E| edges. Each node vi ∈ V represents a term, and each edge connecting vi and<br />
vj, E(vi, vj), indicates that one <strong>of</strong> <strong>the</strong> meanings <strong>of</strong> <strong>the</strong> term in vi is related to one<br />
meaning <strong>of</strong> <strong>the</strong> term in vj. Fur<strong>the</strong>rmore, edges may be labelled according to <strong>the</strong><br />
type <strong>of</strong> relationship held, E(vi, vj, R).<br />
By default, when a lexical network is needed, it is created from <strong>the</strong> tb-triples<br />
given as input. So, <strong>the</strong> proposed algorithms are better suited to ontologise large<br />
amounts <strong>of</strong> knowledge at once. Still, when <strong>the</strong>re are few input tb-triples, <strong>the</strong>y can<br />
exploit an external and larger lexical network or, eventually, <strong>the</strong> ontology where <strong>the</strong><br />
triples are being attached to, if <strong>the</strong> former contains already ontologised triples.