Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
82 Chapter 5. Synset Discovery<br />
<strong>the</strong> membership degree <strong>of</strong> <strong>the</strong> word in vj to Fi, µFi (vj).<br />
5. For each cluster Fi with all elements included in a larger cluster Fj (Fi ∪Fj = Fj<br />
and Fi ∩Fj = Fi), Fi and Fj are merged, giving rise to a new cluster Fk with <strong>the</strong><br />
same elements <strong>of</strong> Fj, where <strong>the</strong> membership degrees <strong>of</strong> <strong>the</strong> common elements<br />
are summed, µFk (vj) = µFi (vj) + µFj (vj).<br />
Figure 5.2 is <strong>the</strong> normalised clustering matrix C for <strong>the</strong> network in figure 5.1,<br />
where we present <strong>the</strong> resulting fuzzy clusters as well. Similarities are computed with<br />
<strong>the</strong> cosine similarity measure, as follows:<br />
sim(a, b) = cos(va, vb) = va.vb<br />
| va||vb| =<br />
|V |<br />
i=0<br />
|V |<br />
<br />
i=0<br />
vai × vbi<br />
v 2 ai ×<br />
|V |<br />
v<br />
i=0<br />
2 bi<br />
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10<br />
0.27 0.17 0.17 0.17 0.14 0.06 0.00 0.00 0.00 0.00<br />
0.21 0.33 0.16 0.16 0.13 0.00 0.00 0.00 0.00 0.00<br />
0.21 0.16 0.33 0.16 0.13 0.00 0.00 0.00 0.00 0.00<br />
0.21 0.16 0.16 0.33 0.13 0.00 0.00 0.00 0.00 0.00<br />
0.13 0.10 0.10 0.10 0.25 0.14 0.10 0.07 0.00 0.00<br />
0.07 0.00 0.00 0.00 0.11 0.32 0.23 0.08 0.09 0.09<br />
0.00 0.00 0.00 0.00 0.17 0.29 0.41 0.14 0.00 0.00<br />
0.00 0.00 0.00 0.00 0.08 0.07 0.10 0.28 0.24 0.24<br />
0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.27 0.32 0.32<br />
0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.27 0.32 0.32<br />
• F1,2,3,4,5 = {v1(0.94), v2(1), v3(1), v4(1), v5(0.68), v6(0.19), v7(0.17), v8(0.08)}<br />
• F1,5,6,7,8,9,10 = {v1(0.06), v5(0.32), v6(0.81), v7(0.83), v8(0.92), v9(1), v10(1)}<br />
Figure 5.2: Clustering matrix C after normalisation and resulting fuzzy sets.<br />
(5.1)<br />
If µFi (va) > 0, <strong>the</strong> word va has a sense with a common meaning to <strong>the</strong> o<strong>the</strong>r<br />
words in Fi. The membership degree µFi (va) may be seen as <strong>the</strong> confidence on <strong>the</strong><br />
usage <strong>of</strong> <strong>the</strong> word va with <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> synset Fi.<br />
Also, step 3 <strong>of</strong> <strong>the</strong> algorithm is optional. In a normalised C, all membership<br />
degrees <strong>of</strong> <strong>the</strong> same word sum up to 1, µFi (vj) = 1. Therefore, membership<br />
degrees <strong>of</strong> a can also be interpreted as <strong>the</strong> possible senses <strong>of</strong> <strong>the</strong> word a and <strong>the</strong><br />
likelihood <strong>of</strong> <strong>the</strong> word a conveying <strong>the</strong>ir meanings. However, normalising C will<br />
make highly connected words to have low memberships.<br />
In order to obtain simple synsets from fuzzy synsets Fi, one has just to apply a<br />
threshold θ to <strong>the</strong> membership degrees, so that all words a with membership lower<br />
than θ, µFi (va) > θ, are excluded from <strong>the</strong> synset. In this case, attention should be<br />
paid when C is normalised, as using <strong>the</strong> same θ for all fuzzy synsets might prevent<br />
that highly connected words are included in any synset.