24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

76 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />

Looking at <strong>the</strong> agreement numbers, we notice that <strong>the</strong>re is good (Green, 1997)<br />

or substantial agreement (Landis and Koch, 1977) in <strong>the</strong> classification <strong>of</strong> synonymy,<br />

hypernymy and causation relations. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> relations with less<br />

classification agreement (fair and moderate) are also those less semantically welldefined.<br />

We have already mentioned <strong>the</strong> problem <strong>of</strong> judging member-<strong>of</strong> triples. We<br />

actually present two values for <strong>the</strong>ir agreement – <strong>the</strong> first value considers <strong>the</strong> four<br />

possible classifications, while <strong>the</strong> second considers member-<strong>of</strong> as generic meronymy,<br />

which means that <strong>the</strong> triples classified as 3 are considered as if <strong>the</strong>y had been<br />

classified as 2. This way, agreement is higher, but still lower than for synonymy,<br />

hypernymy and causation. Ano<strong>the</strong>r source <strong>of</strong> noise for <strong>the</strong> member-<strong>of</strong> relation is<br />

that, sometimes, it can be overlapping with <strong>the</strong> hypernymy. For instance, bear is a<br />

hyponym <strong>of</strong> mammal, but is it also a member-<strong>of</strong> <strong>of</strong> <strong>the</strong> class <strong>of</strong> mammals?<br />

We have also mentioned <strong>the</strong> abstraction problem <strong>of</strong> <strong>the</strong> property-<strong>of</strong> relation<br />

and <strong>the</strong> underspecification problem that occurs especially for property-<strong>of</strong> and for<br />

purpose-<strong>of</strong>. Ano<strong>the</strong>r problem that contributes to less agreement on <strong>the</strong> classification<br />

<strong>of</strong> purpose-<strong>of</strong> relations is related with <strong>the</strong> relaxed semantic constraints <strong>of</strong><br />

its arguments. This relation may connect very different things. Just to give an<br />

idea, it relates an action (verb), which can ei<strong>the</strong>r be a general purpose (e.g. to fry,<br />

to desinfect, to calculate, to censor, to dissociate) or just something one can do with<br />

(e.g. to punish, to transport, to climb, to spend, to entertain), for instance, an instrument<br />

(e.g. frying pan, desinfectant, whip), a concrete object (van, stairs), an<br />

abstract means (e.g. credit, calculation, satire), a human entity (e.g. clown), or a<br />

property (e.g. dissociation).<br />

We conclude by referring that <strong>the</strong>se results are, to some extent, comparable,<br />

though higher, to those obtained in <strong>the</strong> creation <strong>of</strong> MindNet (Richardson et al.,<br />

1993), where a sample <strong>of</strong> 250 relations <strong>of</strong> 25 different types, extracted from <strong>the</strong><br />

Longman Dictionary <strong>of</strong> Contemporary English, were hand-checked for correction.<br />

The overall reported accuracy was 78%. It is also referred that <strong>the</strong> highest accuracy,<br />

<strong>of</strong> about 87%, was obtained for hypernymy, and part-<strong>of</strong> was <strong>the</strong> less reliable relation,<br />

only 15% accurate.<br />

4.3 Discussion<br />

We have presented <strong>the</strong> first step towards <strong>the</strong> automatic creation <strong>of</strong> a wordnetlike<br />

lexical ontology for Portuguese. After explaining how semantic relations are<br />

acquired from dictionaries, we described <strong>the</strong> creation <strong>of</strong> CARTÃO, <strong>the</strong> successor <strong>of</strong><br />

PAPEL and thus <strong>the</strong> largest term-based lexical-semantic network for Portuguese.<br />

CARTÃO can be browsed using <strong>the</strong> interface Folheador (Gonçalo Oliveira et al.,<br />

2012b; Costa, 2011), designed for facilitating <strong>the</strong> navigation on Portuguese LKBs<br />

represented as tb-triples. Folheador is connected to <strong>the</strong> online services VARRA (Freitas<br />

et al., 2012) and AC/DC (Santos and Bick, 2000; Santos, 2011) that query corpora<br />

to provide au<strong>the</strong>ntic examples <strong>of</strong> <strong>the</strong> relations in context. VARRA is designed<br />

not only to search for tb-triples in context, but also to discover new discriminating<br />

patterns for each relation, and to identify good and bad examples <strong>of</strong> each tb-triple.<br />

The examples might be useful to understand and to evaluate <strong>the</strong> triple. An exercise<br />

using VARRA to validate part <strong>of</strong> PAPEL 2.0 is described in Freitas et al. (2012).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!