24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

116 Chapter 7. Moving from term-based to synset-based relations<br />

All <strong>the</strong> candidate synsets with <strong>the</strong> highest rp are added to a new set, C. If<br />

rp < θ, a predefined threshold, a is not attached. O<strong>the</strong>rwise, a is attached to <strong>the</strong><br />

synset(s) <strong>of</strong> C with <strong>the</strong> highest ni. Term b is attached using <strong>the</strong> same procedure,<br />

but fixing a.<br />

The RP algorithm is illustrated in figure 7.2, where it is used to ontologise <strong>the</strong><br />

tb-triple {a R1 b}, given <strong>the</strong> candidate synsets and <strong>the</strong> network in figure 7.1 1 .<br />

rpA1 = 3/4* rpB1 = 3/3*<br />

rpA2 = 2/3 rpB2 = 2/3<br />

rpA3 = 1/3 rpB3 = 1/2<br />

rpA4 = 4/6<br />

max(rp(Ai, {a, R1, b})) = 3/4 → A1<br />

max(rp(Bi, {a, R1, b})) = 3/3 → B1<br />

resulting sb-triple = {A1 R1 B1}<br />

Figure 7.2: Using RP to select <strong>the</strong> suitable synsets for ontologising {a R1 b}, given<br />

<strong>the</strong> candidate synsets and <strong>the</strong> network N in figure 7.1.<br />

Average Cosine (AC): Assuming that related concepts are described by words<br />

related to <strong>the</strong> same concepts, this algorithm exploits all <strong>the</strong> relations in N to select<br />

<strong>the</strong> most similar pair <strong>of</strong> candidate synsets. A term adjacency matrix M(|V | × |V |)<br />

is first created based on N, where |V | is <strong>the</strong> number <strong>of</strong> nodes (terms). If <strong>the</strong> terms<br />

in indexes i and j are connected (related), Mij = 1, o<strong>the</strong>rwise, Mij = 0.<br />

In order to ontologise a and b, <strong>the</strong> most similar pair <strong>of</strong> synsets, Ai ∈ A and<br />

Bj ∈ B, is selected according to <strong>the</strong> adjacencies <strong>of</strong> <strong>the</strong> terms <strong>the</strong>y include. The<br />

similarity between Ai and Bj, represented by <strong>the</strong> adjacency vectors <strong>of</strong> <strong>the</strong>ir terms,<br />

Ai = {ai0, ...,ain}, n = |Ai| and Bj = { bj0, ..., bjm}, m = |Bj|, is given by <strong>the</strong> average<br />

similarity <strong>of</strong> each term aik with each term bjl, in N:<br />

sim(Ai, Bj) =<br />

|Ai| <br />

|Bj| <br />

k=1 l=1<br />

cos(aik, bjl)<br />

|Ai||Bj|<br />

(7.2)<br />

While this expression has been used to find similar nouns in a corpus<br />

(Caraballo, 1999), we adapted it to measure <strong>the</strong> similarity <strong>of</strong> two synsets, represented<br />

as <strong>the</strong> adjacency vectors <strong>of</strong> <strong>the</strong>ir terms.<br />

The AC ontologising algorithm is illustrated in figure 7.3, where it is used to<br />

ontologise <strong>the</strong> tb-triple {a R1 b}, given <strong>the</strong> sample candidate synsets and <strong>the</strong> sample<br />

network in figure 7.1. The example shows that, in opposition to <strong>the</strong> RP algorithm,<br />

AC uses all <strong>the</strong> relations in <strong>the</strong> network (R1, R2 and R3), and not just those <strong>of</strong> <strong>the</strong><br />

same type <strong>of</strong> <strong>the</strong> tb-triple to ontologise (R1).<br />

1 For <strong>the</strong> sake <strong>of</strong> simplicity, we ignored <strong>the</strong> 1+log2(|Ai|) in <strong>the</strong> denominator <strong>of</strong> <strong>the</strong> rp expression,<br />

and considered it to be just <strong>the</strong> size <strong>of</strong> <strong>the</strong> synset, |Ai|.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!