Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
76 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />
Looking at <strong>the</strong> agreement numbers, we notice that <strong>the</strong>re is good (Green, 1997)<br />
or substantial agreement (Landis and Koch, 1977) in <strong>the</strong> classification <strong>of</strong> synonymy,<br />
hypernymy and causation relations. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> relations with less<br />
classification agreement (fair and moderate) are also those less semantically welldefined.<br />
We have already mentioned <strong>the</strong> problem <strong>of</strong> judging member-<strong>of</strong> triples. We<br />
actually present two values for <strong>the</strong>ir agreement – <strong>the</strong> first value considers <strong>the</strong> four<br />
possible classifications, while <strong>the</strong> second considers member-<strong>of</strong> as generic meronymy,<br />
which means that <strong>the</strong> triples classified as 3 are considered as if <strong>the</strong>y had been<br />
classified as 2. This way, agreement is higher, but still lower than for synonymy,<br />
hypernymy and causation. Ano<strong>the</strong>r source <strong>of</strong> noise for <strong>the</strong> member-<strong>of</strong> relation is<br />
that, sometimes, it can be overlapping with <strong>the</strong> hypernymy. For instance, bear is a<br />
hyponym <strong>of</strong> mammal, but is it also a member-<strong>of</strong> <strong>of</strong> <strong>the</strong> class <strong>of</strong> mammals?<br />
We have also mentioned <strong>the</strong> abstraction problem <strong>of</strong> <strong>the</strong> property-<strong>of</strong> relation<br />
and <strong>the</strong> underspecification problem that occurs especially for property-<strong>of</strong> and for<br />
purpose-<strong>of</strong>. Ano<strong>the</strong>r problem that contributes to less agreement on <strong>the</strong> classification<br />
<strong>of</strong> purpose-<strong>of</strong> relations is related with <strong>the</strong> relaxed semantic constraints <strong>of</strong><br />
its arguments. This relation may connect very different things. Just to give an<br />
idea, it relates an action (verb), which can ei<strong>the</strong>r be a general purpose (e.g. to fry,<br />
to desinfect, to calculate, to censor, to dissociate) or just something one can do with<br />
(e.g. to punish, to transport, to climb, to spend, to entertain), for instance, an instrument<br />
(e.g. frying pan, desinfectant, whip), a concrete object (van, stairs), an<br />
abstract means (e.g. credit, calculation, satire), a human entity (e.g. clown), or a<br />
property (e.g. dissociation).<br />
We conclude by referring that <strong>the</strong>se results are, to some extent, comparable,<br />
though higher, to those obtained in <strong>the</strong> creation <strong>of</strong> MindNet (Richardson et al.,<br />
1993), where a sample <strong>of</strong> 250 relations <strong>of</strong> 25 different types, extracted from <strong>the</strong><br />
Longman Dictionary <strong>of</strong> Contemporary English, were hand-checked for correction.<br />
The overall reported accuracy was 78%. It is also referred that <strong>the</strong> highest accuracy,<br />
<strong>of</strong> about 87%, was obtained for hypernymy, and part-<strong>of</strong> was <strong>the</strong> less reliable relation,<br />
only 15% accurate.<br />
4.3 Discussion<br />
We have presented <strong>the</strong> first step towards <strong>the</strong> automatic creation <strong>of</strong> a wordnetlike<br />
lexical ontology for Portuguese. After explaining how semantic relations are<br />
acquired from dictionaries, we described <strong>the</strong> creation <strong>of</strong> CARTÃO, <strong>the</strong> successor <strong>of</strong><br />
PAPEL and thus <strong>the</strong> largest term-based lexical-semantic network for Portuguese.<br />
CARTÃO can be browsed using <strong>the</strong> interface Folheador (Gonçalo Oliveira et al.,<br />
2012b; Costa, 2011), designed for facilitating <strong>the</strong> navigation on Portuguese LKBs<br />
represented as tb-triples. Folheador is connected to <strong>the</strong> online services VARRA (Freitas<br />
et al., 2012) and AC/DC (Santos and Bick, 2000; Santos, 2011) that query corpora<br />
to provide au<strong>the</strong>ntic examples <strong>of</strong> <strong>the</strong> relations in context. VARRA is designed<br />
not only to search for tb-triples in context, but also to discover new discriminating<br />
patterns for each relation, and to identify good and bad examples <strong>of</strong> each tb-triple.<br />
The examples might be useful to understand and to evaluate <strong>the</strong> triple. An exercise<br />
using VARRA to validate part <strong>of</strong> PAPEL 2.0 is described in Freitas et al. (2012).