24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2. <strong>Onto</strong>logising performance 121<br />

• Only tb-triples supported by CETEMPúblico (Santos and Rocha, 2001), a<br />

newspaper corpus <strong>of</strong> Portuguese, were used. This was done based on <strong>the</strong><br />

results <strong>of</strong> <strong>the</strong> automatic validation, as reported in section 4.2.5. We thus had<br />

some confidence on <strong>the</strong> quality <strong>of</strong> <strong>the</strong> triples, as <strong>the</strong>ir arguments co-occurred<br />

at least once in <strong>the</strong> corpus, connected by discriminating textual patterns for<br />

<strong>the</strong>ir relation.<br />

• Triples with <strong>the</strong> following frequent but abstract arguments were discarded:<br />

acto (act), efeito (effect), acção (action), estado (state), coisa (thing), qualidade<br />

(quality) as well as tb-triples with arguments with less than 25 occurrences<br />

in CETEMPúblico. Some <strong>of</strong> <strong>the</strong> frequent and abstract arguments were<br />

actually considered as “empty heads” (see more on section 3.2.1 <strong>of</strong> this <strong>the</strong>sis)<br />

since PAPEL 3.0. This means that, in <strong>the</strong> current version <strong>of</strong> PAPEL, <strong>the</strong>re<br />

are not hypernymy tb-triples where <strong>the</strong>se words are <strong>the</strong> hypernym.<br />

Fur<strong>the</strong>rmore, we unified all meronymy relations (part-<strong>of</strong>, member-<strong>of</strong>, containedin,<br />

material-<strong>of</strong>) in a unique type, part-<strong>of</strong>. This option relied on <strong>the</strong> fact that <strong>the</strong><br />

distinction <strong>of</strong> different meronymy subtypes is sometimes too fine-grained, and because,<br />

as it occurs for English (Ittoo and Bouma, 2010), for Portuguese <strong>the</strong>re are<br />

textual patterns that might be used to denote more that one subtype.<br />

Attachments<br />

From <strong>the</strong> previous selection <strong>of</strong> tb-triples, we chose those held between words included<br />

in, at least, one TePOT synset, and whose attachment raised no doubts. It was<br />

possible to have tb-triples where all possible attachments were correct, as well as<br />

tb-triples without a plausible attachment, because <strong>the</strong> sense <strong>of</strong> one <strong>of</strong> <strong>the</strong> arguments<br />

was not covered by <strong>the</strong> <strong>the</strong>saurus.<br />

For each tb-triple, <strong>the</strong> gold reference contained all plausible attachments, as in<br />

<strong>the</strong> examples <strong>of</strong> figure 7.5. In <strong>the</strong> end, <strong>the</strong> gold reference consisted <strong>of</strong> 452 tb-triples<br />

and <strong>the</strong>ir possible attachments, with those that were plausible marked. Table 7.1<br />

shows <strong>the</strong> distribution <strong>of</strong> tb-triples per relation type, <strong>the</strong> average number <strong>of</strong> possible<br />

attachments, and <strong>the</strong> average number <strong>of</strong> plausible attachments. The proportion <strong>of</strong><br />

plausible attachments per tb-triple can be seen as <strong>the</strong> random chance <strong>of</strong> selecting a<br />

plausible attachment from <strong>the</strong> possible alternatives. This number is between 40%,<br />

for hypernymy, and 50% for purpose-<strong>of</strong>.<br />

Relation tb-triples<br />

Attachments<br />

Avg(possible) Avg(plausible)<br />

Hypernym-<strong>of</strong> 210 13.7 5.5 (40.2%)<br />

Part-<strong>of</strong> 175 11.2 5.5 (49.5%)<br />

Purpose-<strong>of</strong> 67 13.5 6.8 (50.1%)<br />

Table 7.1: Matching possibilities in <strong>the</strong> gold resource.<br />

7.2.2 Performance comparison<br />

In order to compare <strong>the</strong> performance <strong>of</strong> <strong>the</strong> algorithms, we used <strong>the</strong>m to ontologise<br />

<strong>the</strong> 452 tb-triples in <strong>the</strong> gold reference into <strong>the</strong> candidate synsets. However, instead

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!