24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

82 Chapter 5. Synset Discovery<br />

<strong>the</strong> membership degree <strong>of</strong> <strong>the</strong> word in vj to Fi, µFi (vj).<br />

5. For each cluster Fi with all elements included in a larger cluster Fj (Fi ∪Fj = Fj<br />

and Fi ∩Fj = Fi), Fi and Fj are merged, giving rise to a new cluster Fk with <strong>the</strong><br />

same elements <strong>of</strong> Fj, where <strong>the</strong> membership degrees <strong>of</strong> <strong>the</strong> common elements<br />

are summed, µFk (vj) = µFi (vj) + µFj (vj).<br />

Figure 5.2 is <strong>the</strong> normalised clustering matrix C for <strong>the</strong> network in figure 5.1,<br />

where we present <strong>the</strong> resulting fuzzy clusters as well. Similarities are computed with<br />

<strong>the</strong> cosine similarity measure, as follows:<br />

sim(a, b) = cos(va, vb) = va.vb<br />

| va||vb| =<br />

|V |<br />

i=0<br />

|V |<br />

<br />

i=0<br />

vai × vbi<br />

v 2 ai ×<br />

|V |<br />

v<br />

i=0<br />

2 bi<br />

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10<br />

0.27 0.17 0.17 0.17 0.14 0.06 0.00 0.00 0.00 0.00<br />

0.21 0.33 0.16 0.16 0.13 0.00 0.00 0.00 0.00 0.00<br />

0.21 0.16 0.33 0.16 0.13 0.00 0.00 0.00 0.00 0.00<br />

0.21 0.16 0.16 0.33 0.13 0.00 0.00 0.00 0.00 0.00<br />

0.13 0.10 0.10 0.10 0.25 0.14 0.10 0.07 0.00 0.00<br />

0.07 0.00 0.00 0.00 0.11 0.32 0.23 0.08 0.09 0.09<br />

0.00 0.00 0.00 0.00 0.17 0.29 0.41 0.14 0.00 0.00<br />

0.00 0.00 0.00 0.00 0.08 0.07 0.10 0.28 0.24 0.24<br />

0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.27 0.32 0.32<br />

0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.27 0.32 0.32<br />

• F1,2,3,4,5 = {v1(0.94), v2(1), v3(1), v4(1), v5(0.68), v6(0.19), v7(0.17), v8(0.08)}<br />

• F1,5,6,7,8,9,10 = {v1(0.06), v5(0.32), v6(0.81), v7(0.83), v8(0.92), v9(1), v10(1)}<br />

Figure 5.2: Clustering matrix C after normalisation and resulting fuzzy sets.<br />

(5.1)<br />

If µFi (va) > 0, <strong>the</strong> word va has a sense with a common meaning to <strong>the</strong> o<strong>the</strong>r<br />

words in Fi. The membership degree µFi (va) may be seen as <strong>the</strong> confidence on <strong>the</strong><br />

usage <strong>of</strong> <strong>the</strong> word va with <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset Fi.<br />

Also, step 3 <strong>of</strong> <strong>the</strong> algorithm is optional. In a normalised C, all membership<br />

degrees <strong>of</strong> <strong>the</strong> same word sum up to 1, µFi (vj) = 1. Therefore, membership<br />

degrees <strong>of</strong> a can also be interpreted as <strong>the</strong> possible senses <strong>of</strong> <strong>the</strong> word a and <strong>the</strong><br />

likelihood <strong>of</strong> <strong>the</strong> word a conveying <strong>the</strong>ir meanings. However, normalising C will<br />

make highly connected words to have low memberships.<br />

In order to obtain simple synsets from fuzzy synsets Fi, one has just to apply a<br />

threshold θ to <strong>the</strong> membership degrees, so that all words a with membership lower<br />

than θ, µFi (va) > θ, are excluded from <strong>the</strong> synset. In this case, attention should be<br />

paid when C is normalised, as using <strong>the</strong> same θ for all fuzzy synsets might prevent<br />

that highly connected words are included in any synset.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!