Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
7.2. <strong>Onto</strong>logising performance 121<br />
• Only tb-triples supported by CETEMPúblico (Santos and Rocha, 2001), a<br />
newspaper corpus <strong>of</strong> Portuguese, were used. This was done based on <strong>the</strong><br />
results <strong>of</strong> <strong>the</strong> automatic validation, as reported in section 4.2.5. We thus had<br />
some confidence on <strong>the</strong> quality <strong>of</strong> <strong>the</strong> triples, as <strong>the</strong>ir arguments co-occurred<br />
at least once in <strong>the</strong> corpus, connected by discriminating textual patterns for<br />
<strong>the</strong>ir relation.<br />
• Triples with <strong>the</strong> following frequent but abstract arguments were discarded:<br />
acto (act), efeito (effect), acção (action), estado (state), coisa (thing), qualidade<br />
(quality) as well as tb-triples with arguments with less than 25 occurrences<br />
in CETEMPúblico. Some <strong>of</strong> <strong>the</strong> frequent and abstract arguments were<br />
actually considered as “empty heads” (see more on section 3.2.1 <strong>of</strong> this <strong>the</strong>sis)<br />
since PAPEL 3.0. This means that, in <strong>the</strong> current version <strong>of</strong> PAPEL, <strong>the</strong>re<br />
are not hypernymy tb-triples where <strong>the</strong>se words are <strong>the</strong> hypernym.<br />
Fur<strong>the</strong>rmore, we unified all meronymy relations (part-<strong>of</strong>, member-<strong>of</strong>, containedin,<br />
material-<strong>of</strong>) in a unique type, part-<strong>of</strong>. This option relied on <strong>the</strong> fact that <strong>the</strong><br />
distinction <strong>of</strong> different meronymy subtypes is sometimes too fine-grained, and because,<br />
as it occurs for English (Ittoo and Bouma, 2010), for Portuguese <strong>the</strong>re are<br />
textual patterns that might be used to denote more that one subtype.<br />
Attachments<br />
From <strong>the</strong> previous selection <strong>of</strong> tb-triples, we chose those held between words included<br />
in, at least, one TePOT synset, and whose attachment raised no doubts. It was<br />
possible to have tb-triples where all possible attachments were correct, as well as<br />
tb-triples without a plausible attachment, because <strong>the</strong> sense <strong>of</strong> one <strong>of</strong> <strong>the</strong> arguments<br />
was not covered by <strong>the</strong> <strong>the</strong>saurus.<br />
For each tb-triple, <strong>the</strong> gold reference contained all plausible attachments, as in<br />
<strong>the</strong> examples <strong>of</strong> figure 7.5. In <strong>the</strong> end, <strong>the</strong> gold reference consisted <strong>of</strong> 452 tb-triples<br />
and <strong>the</strong>ir possible attachments, with those that were plausible marked. Table 7.1<br />
shows <strong>the</strong> distribution <strong>of</strong> tb-triples per relation type, <strong>the</strong> average number <strong>of</strong> possible<br />
attachments, and <strong>the</strong> average number <strong>of</strong> plausible attachments. The proportion <strong>of</strong><br />
plausible attachments per tb-triple can be seen as <strong>the</strong> random chance <strong>of</strong> selecting a<br />
plausible attachment from <strong>the</strong> possible alternatives. This number is between 40%,<br />
for hypernymy, and 50% for purpose-<strong>of</strong>.<br />
Relation tb-triples<br />
Attachments<br />
Avg(possible) Avg(plausible)<br />
Hypernym-<strong>of</strong> 210 13.7 5.5 (40.2%)<br />
Part-<strong>of</strong> 175 11.2 5.5 (49.5%)<br />
Purpose-<strong>of</strong> 67 13.5 6.8 (50.1%)<br />
Table 7.1: Matching possibilities in <strong>the</strong> gold resource.<br />
7.2.2 Performance comparison<br />
In order to compare <strong>the</strong> performance <strong>of</strong> <strong>the</strong> algorithms, we used <strong>the</strong>m to ontologise<br />
<strong>the</strong> 452 tb-triples in <strong>the</strong> gold reference into <strong>the</strong> candidate synsets. However, instead