Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
7.2. <strong>Onto</strong>logising performance 119<br />
Finally, <strong>the</strong> pair <strong>of</strong> synsets (Ai, Bj), such that Ai and Bj maximise P R(Ai) and<br />
P R(Bj) respectively, is selected.<br />
Minimum Distance (MD): This algorithm assumes that related synsets contain<br />
terms that are close in N. For this purpose, it selects <strong>the</strong> closest pair <strong>of</strong> synsets,<br />
given <strong>the</strong> average (edge-based) distance <strong>of</strong> <strong>the</strong>ir terms:<br />
dist(Ai, Bj) =<br />
|Ai| |Bj| <br />
dist(aik, bjl)<br />
k=1 l=1<br />
|Ai||Bj|<br />
(7.7)<br />
The minimum distance between two nodes is <strong>the</strong> number <strong>of</strong> edges in <strong>the</strong> shortest<br />
path between <strong>the</strong>m, computed using Dijkstra’s algorithm (Dijkstra, 1959). If a<br />
term in a synset (aik or bjl) is not in N, <strong>the</strong>y are removed from Ai and Bj before<br />
this calculation. If this algorithm was applied for attaching ontologise <strong>the</strong> tb-triple<br />
{a R1 b}, given <strong>the</strong> situation <strong>of</strong> figure 7.1, <strong>the</strong>re would be several ties for <strong>the</strong> best<br />
pair <strong>of</strong> synsets, because this network is simpler than most real networks.<br />
7.2 <strong>Onto</strong>logising performance<br />
For Portuguese, TeP is <strong>the</strong> only freely available lexical resource with synset-relations.<br />
However, <strong>the</strong>se relations are limited to antonymy, which is not a very prototypical<br />
semantic relation. Therefore, in order to quantify <strong>the</strong> performance <strong>of</strong> <strong>the</strong> algorithms<br />
presented in <strong>the</strong> previous section, and to compare <strong>the</strong>m for ontologising different relations,<br />
we have created a gold reference, manually, with a set <strong>of</strong> tb-triples extracted<br />
from dictionaries and <strong>the</strong>ir plausible attachments to <strong>the</strong> synsets <strong>of</strong> two handcrafted<br />
<strong>the</strong>sauri. Only after this, we used TeP as a gold resource and <strong>the</strong> algorithms for<br />
ontologising antonymy tb-triples.<br />
This section starts by describing <strong>the</strong> resources involved in <strong>the</strong> creation <strong>of</strong> our<br />
handcrafted gold reference and reports on <strong>the</strong> results using each algorithm for ontologising<br />
hypernymy, part-<strong>of</strong> and purpose-<strong>of</strong> tb-triples. Then, we present <strong>the</strong> results<br />
<strong>of</strong> ontologising antonymy tb-triples in TeP.<br />
7.2.1 Gold reference<br />
The gold reference for this evaluation consisted <strong>of</strong> <strong>the</strong> synsets <strong>of</strong> TeP 2.0 and<br />
OpenThesaurus.<strong>PT</strong>, where samples <strong>of</strong> tb-triples from PAPEL 2.0 were attached.<br />
Synsets<br />
In order to eliminate <strong>the</strong> noise from automatic procedures, we decided to include<br />
only synsets from handcrafted <strong>the</strong>sauri in our gold reference. As referred in <strong>the</strong><br />
previous chapters, for Portuguese, <strong>the</strong>re are currently two free handcrafted broadcoverage<br />
<strong>the</strong>sauri: TeP 2.0 (Maziero et al., 2008) and OpenThesarus.<strong>PT</strong> (OT.<strong>PT</strong>).<br />
TeP is <strong>the</strong> largest by far (see section 3.1.2) and is created by experts, so its synsets<br />
were <strong>the</strong> best alternative for our gold reference. However, TeP was created for