Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
116 Chapter 7. Moving from term-based to synset-based relations<br />
All <strong>the</strong> candidate synsets with <strong>the</strong> highest rp are added to a new set, C. If<br />
rp < θ, a predefined threshold, a is not attached. O<strong>the</strong>rwise, a is attached to <strong>the</strong><br />
synset(s) <strong>of</strong> C with <strong>the</strong> highest ni. Term b is attached using <strong>the</strong> same procedure,<br />
but fixing a.<br />
The RP algorithm is illustrated in figure 7.2, where it is used to ontologise <strong>the</strong><br />
tb-triple {a R1 b}, given <strong>the</strong> candidate synsets and <strong>the</strong> network in figure 7.1 1 .<br />
rpA1 = 3/4* rpB1 = 3/3*<br />
rpA2 = 2/3 rpB2 = 2/3<br />
rpA3 = 1/3 rpB3 = 1/2<br />
rpA4 = 4/6<br />
max(rp(Ai, {a, R1, b})) = 3/4 → A1<br />
max(rp(Bi, {a, R1, b})) = 3/3 → B1<br />
resulting sb-triple = {A1 R1 B1}<br />
Figure 7.2: Using RP to select <strong>the</strong> suitable synsets for ontologising {a R1 b}, given<br />
<strong>the</strong> candidate synsets and <strong>the</strong> network N in figure 7.1.<br />
Average Cosine (AC): Assuming that related concepts are described by words<br />
related to <strong>the</strong> same concepts, this algorithm exploits all <strong>the</strong> relations in N to select<br />
<strong>the</strong> most similar pair <strong>of</strong> candidate synsets. A term adjacency matrix M(|V | × |V |)<br />
is first created based on N, where |V | is <strong>the</strong> number <strong>of</strong> nodes (terms). If <strong>the</strong> terms<br />
in indexes i and j are connected (related), Mij = 1, o<strong>the</strong>rwise, Mij = 0.<br />
In order to ontologise a and b, <strong>the</strong> most similar pair <strong>of</strong> synsets, Ai ∈ A and<br />
Bj ∈ B, is selected according to <strong>the</strong> adjacencies <strong>of</strong> <strong>the</strong> terms <strong>the</strong>y include. The<br />
similarity between Ai and Bj, represented by <strong>the</strong> adjacency vectors <strong>of</strong> <strong>the</strong>ir terms,<br />
Ai = {ai0, ...,ain}, n = |Ai| and Bj = { bj0, ..., bjm}, m = |Bj|, is given by <strong>the</strong> average<br />
similarity <strong>of</strong> each term aik with each term bjl, in N:<br />
sim(Ai, Bj) =<br />
|Ai| <br />
|Bj| <br />
k=1 l=1<br />
cos(aik, bjl)<br />
|Ai||Bj|<br />
(7.2)<br />
While this expression has been used to find similar nouns in a corpus<br />
(Caraballo, 1999), we adapted it to measure <strong>the</strong> similarity <strong>of</strong> two synsets, represented<br />
as <strong>the</strong> adjacency vectors <strong>of</strong> <strong>the</strong>ir terms.<br />
The AC ontologising algorithm is illustrated in figure 7.3, where it is used to<br />
ontologise <strong>the</strong> tb-triple {a R1 b}, given <strong>the</strong> sample candidate synsets and <strong>the</strong> sample<br />
network in figure 7.1. The example shows that, in opposition to <strong>the</strong> RP algorithm,<br />
AC uses all <strong>the</strong> relations in <strong>the</strong> network (R1, R2 and R3), and not just those <strong>of</strong> <strong>the</strong><br />
same type <strong>of</strong> <strong>the</strong> tb-triple to ontologise (R1).<br />
1 For <strong>the</strong> sake <strong>of</strong> simplicity, we ignored <strong>the</strong> 1+log2(|Ai|) in <strong>the</strong> denominator <strong>of</strong> <strong>the</strong> rp expression,<br />
and considered it to be just <strong>the</strong> size <strong>of</strong> <strong>the</strong> synset, |Ai|.