24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.1. <strong>Onto</strong>logising algorithms 115<br />

Each algorithm can be seen as a different strategy for attaching terms a and<br />

b, in {a R b}, to suitable synsets Ai ∈ T and Bj ∈ T , Ai = {ai0, ai1, ..., ain},<br />

Bj = {bj0, bj1, ..., bjm}, where n = |Ai| and m = |Bj|. This results in a<br />

sb-triple {Ai R Bj}. All algorithms, presented below, start by getting all<br />

<strong>the</strong> candidate synsets from <strong>the</strong> <strong>the</strong>saurus, which are those containing term a,<br />

A ∈ T : ∀(Ai ∈ A) → a ∈ Ai, and all with term b, B ∈ T : ∀(Bj ∈ B) → b ∈ Bj.<br />

Also, for all <strong>of</strong> <strong>the</strong> proposed algorithms, if T does not contain <strong>the</strong> term argument <strong>of</strong> a<br />

tb-triple (e.g. a), a new synset containing only this term is created (e.g. Sa = {a}).<br />

Before presenting <strong>the</strong> algorithms, we introduce figure 7.1, which contains candidate<br />

synsets for attaching terms a and b, as well as a made up lexical network N.<br />

There, nodes, identified by letters, can be seen as lexical items (terms), while <strong>the</strong><br />

connections represent tb-triples <strong>of</strong> a labelled type (R1, R2 and R3). Note that N<br />

intentionally does not contain some lexical items in <strong>the</strong> synsets (k to p), which happens<br />

if <strong>the</strong>y do not occur in any tb-triple. Both <strong>the</strong> synsets and <strong>the</strong> network <strong>of</strong><br />

figure 7.1 will be used in <strong>the</strong> illustration <strong>of</strong> some <strong>of</strong> <strong>the</strong> algorithms. We intentionally<br />

created an example where, depending on <strong>the</strong> used algorithm, <strong>the</strong> resulting sb-triple<br />

is different.<br />

Sample candidate synsets:<br />

A1 = {a, c, d, k} B1 = {b, g, h}<br />

A2 = {a, e, l} B2 = {b, f, o}<br />

A3 = {a, m, n} B3 = {b, p}<br />

A4 = {a, c, d, e, i, j}<br />

Sample network N:<br />

Figure 7.1: Candidate synsets and lexical network for <strong>the</strong> ontologising examples.<br />

Related Proportion (RP): This algorithm is based on a similar assumption to<br />

Pennacchiotti and Pantel (2006)’s anchor approach. First, to attach term a, term b<br />

is fixed. For each synset Ai ∈ A, ni is <strong>the</strong> number <strong>of</strong> terms aik ∈ Ai such that <strong>the</strong><br />

triple {aik R b} holds. The related proportion rp is computed as follows:<br />

rp(Ai, {a, R, b}) =<br />

ni<br />

1 + log2(|Ai|)<br />

(7.1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!