24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

114 Chapter 7. Moving from term-based to synset-based relations<br />

<strong>the</strong> tb-triples, DLP is proprietary, which means we cannot use <strong>the</strong> context <strong>of</strong> <strong>the</strong><br />

tb-triples <strong>of</strong> PAPEL. Not to refer that <strong>the</strong>re are several small definitions that do not<br />

provide enough context. Given this limitation, toge<strong>the</strong>r with <strong>the</strong> need to map <strong>of</strong>ten<br />

un-matching (Dolan, 1994; Peters et al., 1998) word sense definitions in different<br />

resources, and to define extraction contexts for different heterogeneous resources,<br />

we decided to ontologise without using <strong>the</strong> extraction context. This enables <strong>the</strong> creation<br />

<strong>of</strong> IE systems with two completely independent modules: (i) one responsible<br />

for extracting tb-triples; and (ii) ano<strong>the</strong>r for ontologising <strong>the</strong>m. In o<strong>the</strong>r words, <strong>the</strong><br />

second module attaches each term in a triple to a concept, represented, for instance,<br />

as a synset in a broad-coverage lexical ontology. We believe that this approach is an<br />

interesting way <strong>of</strong> coping with information sparsity, since it allows for <strong>the</strong> extraction<br />

<strong>of</strong> knowledge from different heterogeneous sources (e.g. dictionaries, encyclopedias,<br />

corpora), and provides a way to harmoniously integrate all <strong>the</strong> extracted information<br />

in a common knowledge base.<br />

In this chapter, we propose several algorithms for moving from tb-triples to<br />

synset-based relational triples (hereafter, sb-triples), taking advantage <strong>of</strong> nothing<br />

but <strong>the</strong> existing synsets and a lexical network with tb-triples. We start by presenting<br />

<strong>the</strong> algorithms and <strong>the</strong>n we describe how <strong>the</strong>y were evaluated and compared. The<br />

performance results supported <strong>the</strong> choice <strong>of</strong> this kind <strong>of</strong> algorithm in <strong>the</strong> creation<br />

<strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>. Also, given that <strong>the</strong> ontologising algorithms result in a set <strong>of</strong> synsets<br />

related among <strong>the</strong>mselves by semantic relations, <strong>the</strong>y are suitable for <strong>the</strong> last step<br />

<strong>of</strong> <strong>the</strong> ECO approach for creating wordnets. The core <strong>of</strong> this part <strong>of</strong> <strong>the</strong> work was<br />

originally reported in Gonçalo Oliveira and Gomes (2012a). Its earlier stages had<br />

been reported in Gonçalo Oliveira and Gomes (2011c).<br />

7.1 <strong>Onto</strong>logising algorithms<br />

Our work on ontologising semantic relations is similar to that presented by Pennacchiotti<br />

and Pantel (2006). The main difference is that <strong>the</strong> previous authors<br />

ontologise <strong>the</strong> semantic relations into WordNet, and exploit its structure, including<br />

synsets and existing synset-relations. We, on <strong>the</strong> o<strong>the</strong>r hand, had in mind to ontologise<br />

in a synset-base without synset-relations (TeP), so we had to find alternatives,<br />

such as exploring all <strong>the</strong> extracted information.<br />

The goal <strong>of</strong> <strong>the</strong> proposed algorithms is to ontologise tb-triples, {a R b}, in <strong>the</strong><br />

synsets <strong>of</strong> a <strong>the</strong>saurus T . Instead <strong>of</strong> considering <strong>the</strong> context where <strong>the</strong> triples were<br />

extracted from, or <strong>the</strong> synset glosses, <strong>the</strong>y exploit <strong>the</strong> information in a given lexical<br />

network N to select <strong>the</strong> best candidate synsets. A lexical network is established<br />

by a set <strong>of</strong> tb-triples, and is defined as a graph, N = (V, E), with |V | nodes and<br />

|E| edges. Each node vi ∈ V represents a term, and each edge connecting vi and<br />

vj, E(vi, vj), indicates that one <strong>of</strong> <strong>the</strong> meanings <strong>of</strong> <strong>the</strong> term in vi is related to one<br />

meaning <strong>of</strong> <strong>the</strong> term in vj. Fur<strong>the</strong>rmore, edges may be labelled according to <strong>the</strong><br />

type <strong>of</strong> relationship held, E(vi, vj, R).<br />

By default, when a lexical network is needed, it is created from <strong>the</strong> tb-triples<br />

given as input. So, <strong>the</strong> proposed algorithms are better suited to ontologise large<br />

amounts <strong>of</strong> knowledge at once. Still, when <strong>the</strong>re are few input tb-triples, <strong>the</strong>y can<br />

exploit an external and larger lexical network or, eventually, <strong>the</strong> ontology where <strong>the</strong><br />

triples are being attached to, if <strong>the</strong> former contains already ontologised triples.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!