24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

58 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />

added to synsets are <strong>the</strong> target <strong>of</strong> a clustering algorithm, similar to <strong>the</strong> one<br />

presented in chapter 5.<br />

• Chapter 7 proposes several algorithms for moving from term-based semantic<br />

relations to relations held between synsets, using only <strong>the</strong> extracted termbased<br />

relations and <strong>the</strong> discovered synsets.<br />

• After presenting all <strong>the</strong> steps, chapter 8 shows how <strong>the</strong>y can be combined in <strong>the</strong><br />

ECO approach, in order to reach our final goal, <strong>Onto</strong>.<strong>PT</strong>, a lexical ontology<br />

for Portuguese. In <strong>the</strong> same chapter, an overview <strong>of</strong> <strong>the</strong> current version <strong>of</strong><br />

<strong>Onto</strong>.<strong>PT</strong> is provided.<br />

It is possible to integrate any kind <strong>of</strong> information, from any source, in <strong>Onto</strong>.<strong>PT</strong>,<br />

as long as it is represented as term-based triples. Still, regarding <strong>the</strong> goal <strong>of</strong><br />

creating a broad-coverage lexical ontology, and despite some experiments using<br />

Wikipedia (Gonçalo Oliveira et al., 2010a), electronic dictionaries were our main<br />

target for exploitation, as in <strong>the</strong> MindNet project (Richardson et al., 1998; Vanderwende<br />

et al., 2005). As referred in section 2, language dictionaries are <strong>the</strong> main<br />

source <strong>of</strong> general lexical information <strong>of</strong> a language. They are structured on words<br />

and senses and are more exhaustive on this field than o<strong>the</strong>r textual resources. At<br />

<strong>the</strong> same time, <strong>the</strong>y are systematic and thus easier to parse.<br />

This chapter describes <strong>the</strong> extraction <strong>of</strong> semantic relations from three Portuguese<br />

dictionaries, which resulted in <strong>the</strong> LKB named CARTÃO, a large lexical-semantic<br />

network for Portuguese. Part <strong>of</strong> <strong>the</strong> work presented here is also reported in Gonçalo<br />

Oliveira et al. (2011).<br />

We start this chapter by introducing our approach to <strong>the</strong> acquisition <strong>of</strong> termbased<br />

relational triples from dictionary definitions. Then, we describe <strong>the</strong> work<br />

performed on <strong>the</strong> creation <strong>of</strong> CARTÃO, starting with a brief introduction about<br />

<strong>the</strong> dictionaries used, some issues about <strong>the</strong>ir parsing and about <strong>the</strong> structure <strong>of</strong><br />

<strong>the</strong>ir definitions. After that, we present <strong>the</strong> contents <strong>of</strong> CARTÃO, we compare<br />

<strong>the</strong> knowledge extracted from each <strong>of</strong> <strong>the</strong> three dictionaries, and evaluate it using<br />

different procedures. We end this chapter with a brief discussion on <strong>the</strong> utility <strong>of</strong> a<br />

LKB structured as CARTÃO. 4.1 Semantic relations from definitions<br />

In our work, <strong>the</strong> extraction <strong>of</strong> semantic relations from dictionaries is based on a<br />

fixed set <strong>of</strong> handcrafted rules, as opposing to state-<strong>of</strong>-<strong>the</strong> art bootstrapping algorithms<br />

that learn relations given a small set <strong>of</strong> seeds (see more in section 3.2.2).<br />

Although our approach is more time-consuming, especially in <strong>the</strong> construction <strong>of</strong><br />

<strong>the</strong> grammars, which have to be manually adapted to new situations, this is not<br />

critical for dictionaries. As we will discuss in section 4.2.3, many regularities are<br />

preserved along definitions in <strong>the</strong> same dictionary, and even in different dictionaries.<br />

The vocabulary thus tends to be simple and easy to parse. Also, most bootstrapping<br />

algorithms rely heavily on redundancy in large collections <strong>of</strong> text, while dictionaries<br />

are smaller and much less redundant. Fur<strong>the</strong>rmore, our approach provides higher<br />

control over <strong>the</strong> discriminating patterns.<br />

The extraction <strong>of</strong> semantic relations is inspired by <strong>the</strong> construction <strong>of</strong> PAPEL,<br />

reported in Gonçalo Oliveira et al. (2009, 2010b), and consists <strong>of</strong> one manual step,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!