24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

74 Chapter 4. Acquisition <strong>of</strong> Semantic Relations<br />

<strong>of</strong> error for <strong>the</strong> correct relations, with a confidence level <strong>of</strong> 95% 14 (ME(2)). The<br />

same table presents <strong>the</strong> agreement between <strong>the</strong> judges, quantified as <strong>the</strong> number <strong>of</strong><br />

classification matches (IAA) and as <strong>the</strong> Kappa value (κ, see Cohen (1960); Carletta<br />

(1996)), where <strong>the</strong> amount <strong>of</strong> agreement expected by chance is removed, because<br />

<strong>the</strong> probability <strong>of</strong> each judge giving a certain classification is considered.<br />

Relation Judge<br />

n synonym-<strong>of</strong> n<br />

v synonym-<strong>of</strong> v<br />

n hypernym-<strong>of</strong> n<br />

n member-<strong>of</strong> n<br />

v causation-<strong>of</strong> n<br />

v purpose-<strong>of</strong> n<br />

adj property-<strong>of</strong> v<br />

Resource<br />

DLP DA Wikt.<strong>PT</strong><br />

0 1 2 0 1 2 0 1 2<br />

J1 0% 0% 100% 1% 0% 99% 1% 0% 99%<br />

J2 0% 1% 99% 1% 0% 99% 2% 0% 98%<br />

J1 0% 0% 100% 1% 0% 99% 5% 0% 95%<br />

J2 1% 1% 98% 01% 0% 99% 5% 2% 93%<br />

J1 2% 4% 94% 5% 3% 92% 4% 12% 84%<br />

J2 3% 6% 91% 9% 7% 84% 4% 8% 88%<br />

J1 4% 0% 93%+3% 7% 5% 85%+3% 12% 5% 79%+4%<br />

J2 4% 3% 76%+17% 12% 3% 42%+43% 18% 14% 54%+14%<br />

J1 4% 1% 95% 1% 3% 96% 7% 10% 83%<br />

J2 4% 6% 90% 4% 5% 91% 7% 7% 86%<br />

J1 31% 0% 69% 25% 1% 74% 24% 1% 75%<br />

J2 29% 2% 69% 20% 1% 79% 25% 0% 75%<br />

J1 23% 5% 72% 18% 11% 71% 26% 5% 69%<br />

J2 12% 1% 78% 12% 10% 78% 15% 1% 75%<br />

Table 4.11: Results <strong>of</strong> <strong>the</strong> manual evaluation <strong>of</strong> tb-triples according to resource.<br />

Relation Judge<br />

n synonym-<strong>of</strong> n<br />

v synonym-<strong>of</strong> v<br />

n hypernym-<strong>of</strong> n<br />

n member-<strong>of</strong> n<br />

v causation-<strong>of</strong> n<br />

v purpose-<strong>of</strong> n<br />

adj property-<strong>of</strong> v<br />

Total<br />

0 1 2<br />

J1 2 (1%) 0 298 (99%) 1.1%<br />

J2 3 (1%) 1 (≈0) 296 (99%) 1.1%<br />

J1 6 (2%) 0 294 (98%) 1.6%<br />

J2 7 (2%) 3 (1%) 290 (97%) 1.9%<br />

J1 11 (4%) 19 (6%) 270 (90%) 3.4%<br />

J2 16 (5%) 21 (7%) 263 (88%) 3.7%<br />

J1 23 (8%) 10 (3%) 257+10 (86%+3%) 3.5%<br />

J2 34 (11%) 20 (7%) 172+74(57%+25%) 4.3%<br />

J1 12 (4%) 14 (5%) 274 (91%) 3.2%<br />

J2 15 (5%) 18 (6%) 267 (89%) 3.5%<br />

J1 80 (27%) 2 (1%) 218 (73%) 4.9%<br />

J2 74 (25%) 3 (1%) 223 (74%) 4.9%<br />

J1 67 (22%) 21 (7%) 212 (71%) 5.1%<br />

J2 39 (13%) 30 (10%) 231 (77%) 4.7%<br />

ME (2) IAA κ<br />

Table 4.12: Results <strong>of</strong> <strong>the</strong> manual evaluation <strong>of</strong> tb-triples.<br />

0.99 0.66<br />

0.98 0.68<br />

0.93 0.64<br />

0.67/0.88 0.32/0.55<br />

0.93 0.60<br />

0.79 0.48<br />

0.81 0.56<br />

Manual validation showed that synonymy relations are <strong>the</strong> most reliable in<br />

CARTÃO, as <strong>the</strong>ir accuracy is always close to 100%. This was somehow expected,<br />

because <strong>the</strong>se relations are also <strong>the</strong> easiest to extract (see section 4.2.3). The lowest<br />

accuracy <strong>of</strong> synonymy was that <strong>of</strong> <strong>the</strong> triples extracted from Wiktionary.<strong>PT</strong> (93%-<br />

95%). This happens mainly due to parsing errors on <strong>the</strong> wikitext indicating <strong>the</strong> kinds<br />

<strong>of</strong> verb (e.g. transitive, intransitive). This problem has however been corrected for<br />

fur<strong>the</strong>r versions <strong>of</strong> CARTÃO. About 90% <strong>of</strong> <strong>the</strong> hypernymy triples are correct. Of <strong>the</strong> incorrect ones, most<br />

connect related words. This happens especially in Wiktionary, where definitions, as<br />

<strong>the</strong> following, result in <strong>the</strong> extraction <strong>of</strong> hypernyms and not members or synonymys:<br />

14 The calculation <strong>of</strong> <strong>the</strong> margins <strong>of</strong> error assumed that <strong>the</strong> evaluated samples were selected<br />

randomly, which is not exactly what happened. Although <strong>the</strong> triples in <strong>the</strong> samples were selected<br />

randomly, <strong>the</strong>re was a constraint to make each sample contain exactly 100 tb-triples from each<br />

resource, and <strong>the</strong> resources have different sizes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!