Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
58 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />
added to synsets are <strong>the</strong> target <strong>of</strong> a clustering algorithm, similar to <strong>the</strong> one<br />
presented in chapter 5.<br />
• Chapter 7 proposes several algorithms for moving from term-based semantic<br />
relations to relations held between synsets, using only <strong>the</strong> extracted termbased<br />
relations and <strong>the</strong> discovered synsets.<br />
• After presenting all <strong>the</strong> steps, chapter 8 shows how <strong>the</strong>y can be combined in <strong>the</strong><br />
ECO approach, in order to reach our final goal, <strong>Onto</strong>.<strong>PT</strong>, a lexical ontology<br />
for Portuguese. In <strong>the</strong> same chapter, an overview <strong>of</strong> <strong>the</strong> current version <strong>of</strong><br />
<strong>Onto</strong>.<strong>PT</strong> is provided.<br />
It is possible to integrate any kind <strong>of</strong> information, from any source, in <strong>Onto</strong>.<strong>PT</strong>,<br />
as long as it is represented as term-based triples. Still, regarding <strong>the</strong> goal <strong>of</strong><br />
creating a broad-coverage lexical ontology, and despite some experiments using<br />
Wikipedia (Gonçalo Oliveira et al., 2010a), electronic dictionaries were our main<br />
target for exploitation, as in <strong>the</strong> MindNet project (Richardson et al., 1998; Vanderwende<br />
et al., 2005). As referred in section 2, language dictionaries are <strong>the</strong> main<br />
source <strong>of</strong> general lexical information <strong>of</strong> a language. They are structured on words<br />
and senses and are more exhaustive on this field than o<strong>the</strong>r textual resources. At<br />
<strong>the</strong> same time, <strong>the</strong>y are systematic and thus easier to parse.<br />
This chapter describes <strong>the</strong> extraction <strong>of</strong> semantic relations from three Portuguese<br />
dictionaries, which resulted in <strong>the</strong> LKB named CARTÃO, a large lexical-semantic<br />
network for Portuguese. Part <strong>of</strong> <strong>the</strong> work presented here is also reported in Gonçalo<br />
Oliveira et al. (2011).<br />
We start this chapter by introducing our approach to <strong>the</strong> acquisition <strong>of</strong> termbased<br />
relational triples from dictionary definitions. Then, we describe <strong>the</strong> work<br />
performed on <strong>the</strong> creation <strong>of</strong> CARTÃO, starting with a brief introduction about<br />
<strong>the</strong> dictionaries used, some issues about <strong>the</strong>ir parsing and about <strong>the</strong> structure <strong>of</strong><br />
<strong>the</strong>ir definitions. After that, we present <strong>the</strong> contents <strong>of</strong> CARTÃO, we compare<br />
<strong>the</strong> knowledge extracted from each <strong>of</strong> <strong>the</strong> three dictionaries, and evaluate it using<br />
different procedures. We end this chapter with a brief discussion on <strong>the</strong> utility <strong>of</strong> a<br />
LKB structured as CARTÃO. 4.1 Semantic relations from definitions<br />
In our work, <strong>the</strong> extraction <strong>of</strong> semantic relations from dictionaries is based on a<br />
fixed set <strong>of</strong> handcrafted rules, as opposing to state-<strong>of</strong>-<strong>the</strong> art bootstrapping algorithms<br />
that learn relations given a small set <strong>of</strong> seeds (see more in section 3.2.2).<br />
Although our approach is more time-consuming, especially in <strong>the</strong> construction <strong>of</strong><br />
<strong>the</strong> grammars, which have to be manually adapted to new situations, this is not<br />
critical for dictionaries. As we will discuss in section 4.2.3, many regularities are<br />
preserved along definitions in <strong>the</strong> same dictionary, and even in different dictionaries.<br />
The vocabulary thus tends to be simple and easy to parse. Also, most bootstrapping<br />
algorithms rely heavily on redundancy in large collections <strong>of</strong> text, while dictionaries<br />
are smaller and much less redundant. Fur<strong>the</strong>rmore, our approach provides higher<br />
control over <strong>the</strong> discriminating patterns.<br />
The extraction <strong>of</strong> semantic relations is inspired by <strong>the</strong> construction <strong>of</strong> PAPEL,<br />
reported in Gonçalo Oliveira et al. (2009, 2010b), and consists <strong>of</strong> one manual step,