24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2. <strong>Onto</strong>logising performance 119<br />

Finally, <strong>the</strong> pair <strong>of</strong> synsets (Ai, Bj), such that Ai and Bj maximise P R(Ai) and<br />

P R(Bj) respectively, is selected.<br />

Minimum Distance (MD): This algorithm assumes that related synsets contain<br />

terms that are close in N. For this purpose, it selects <strong>the</strong> closest pair <strong>of</strong> synsets,<br />

given <strong>the</strong> average (edge-based) distance <strong>of</strong> <strong>the</strong>ir terms:<br />

dist(Ai, Bj) =<br />

|Ai| |Bj| <br />

dist(aik, bjl)<br />

k=1 l=1<br />

|Ai||Bj|<br />

(7.7)<br />

The minimum distance between two nodes is <strong>the</strong> number <strong>of</strong> edges in <strong>the</strong> shortest<br />

path between <strong>the</strong>m, computed using Dijkstra’s algorithm (Dijkstra, 1959). If a<br />

term in a synset (aik or bjl) is not in N, <strong>the</strong>y are removed from Ai and Bj before<br />

this calculation. If this algorithm was applied for attaching ontologise <strong>the</strong> tb-triple<br />

{a R1 b}, given <strong>the</strong> situation <strong>of</strong> figure 7.1, <strong>the</strong>re would be several ties for <strong>the</strong> best<br />

pair <strong>of</strong> synsets, because this network is simpler than most real networks.<br />

7.2 <strong>Onto</strong>logising performance<br />

For Portuguese, TeP is <strong>the</strong> only freely available lexical resource with synset-relations.<br />

However, <strong>the</strong>se relations are limited to antonymy, which is not a very prototypical<br />

semantic relation. Therefore, in order to quantify <strong>the</strong> performance <strong>of</strong> <strong>the</strong> algorithms<br />

presented in <strong>the</strong> previous section, and to compare <strong>the</strong>m for ontologising different relations,<br />

we have created a gold reference, manually, with a set <strong>of</strong> tb-triples extracted<br />

from dictionaries and <strong>the</strong>ir plausible attachments to <strong>the</strong> synsets <strong>of</strong> two handcrafted<br />

<strong>the</strong>sauri. Only after this, we used TeP as a gold resource and <strong>the</strong> algorithms for<br />

ontologising antonymy tb-triples.<br />

This section starts by describing <strong>the</strong> resources involved in <strong>the</strong> creation <strong>of</strong> our<br />

handcrafted gold reference and reports on <strong>the</strong> results using each algorithm for ontologising<br />

hypernymy, part-<strong>of</strong> and purpose-<strong>of</strong> tb-triples. Then, we present <strong>the</strong> results<br />

<strong>of</strong> ontologising antonymy tb-triples in TeP.<br />

7.2.1 Gold reference<br />

The gold reference for this evaluation consisted <strong>of</strong> <strong>the</strong> synsets <strong>of</strong> TeP 2.0 and<br />

OpenThesaurus.<strong>PT</strong>, where samples <strong>of</strong> tb-triples from PAPEL 2.0 were attached.<br />

Synsets<br />

In order to eliminate <strong>the</strong> noise from automatic procedures, we decided to include<br />

only synsets from handcrafted <strong>the</strong>sauri in our gold reference. As referred in <strong>the</strong><br />

previous chapters, for Portuguese, <strong>the</strong>re are currently two free handcrafted broadcoverage<br />

<strong>the</strong>sauri: TeP 2.0 (Maziero et al., 2008) and OpenThesarus.<strong>PT</strong> (OT.<strong>PT</strong>).<br />

TeP is <strong>the</strong> largest by far (see section 3.1.2) and is created by experts, so its synsets<br />

were <strong>the</strong> best alternative for our gold reference. However, TeP was created for

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!