Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
84 Chapter 5. Synset Discovery<br />
tb-triples (synpairs) <strong>of</strong> CARTÃO (introduced in section 4), which establish a synonymy<br />
network. However, this work was done before <strong>the</strong> last version <strong>of</strong> CARTÃO was available. The presented results were obtained with relations extracted using<br />
<strong>the</strong> grammars <strong>of</strong> PAPEL 2.0 in DLP, DA (in a previous modernisation stage) and<br />
<strong>the</strong> 25th October 2010 Wiktionary.<strong>PT</strong> dump.<br />
5.3.1 Synonymy network data<br />
Before running <strong>the</strong> clustering procedure, we examined some properties <strong>of</strong> <strong>the</strong> synonymy<br />
network established by synpairs collected from <strong>the</strong> three dictionaries. Table<br />
5.1 shows <strong>the</strong> following properties, typically used to analyse graphs:<br />
• Number <strong>of</strong> nodes |V |, which corresponds to <strong>the</strong> number <strong>of</strong> unique lexical items<br />
in <strong>the</strong> synpair arguments.<br />
• Number <strong>of</strong> edges |E|, which corresponds to <strong>the</strong> number <strong>of</strong> unique synpairs.<br />
• Average degree (deg(N)) <strong>of</strong> <strong>the</strong> network (see expression 5.5), which is <strong>the</strong><br />
average number <strong>of</strong> edges per node.<br />
• Number <strong>of</strong> nodes <strong>of</strong> <strong>the</strong> largest connected sub-network |Vlcs|, which is <strong>the</strong><br />
largest group <strong>of</strong> nodes connected directly or indirectly in N.<br />
• Average clustering coefficient CClcs <strong>of</strong> <strong>the</strong> largest connected sub-network,<br />
which measures <strong>the</strong> degree to which nodes tend to cluster toge<strong>the</strong>r as a value<br />
in [0-1] (see expression 5.7). In random graphs, this coefficient is close to 0.<br />
The local clustering coefficient CC(vi) (see expression 5.8) <strong>of</strong> a node vi quantifies<br />
how connected its neighbours are.<br />
deg(N) = 1<br />
|V | ×<br />
|V | <br />
i=1<br />
deg(vi) : vi ∈ V (5.5) deg(vi) = |E(vi, vk)| : vk ∈ V (5.6)<br />
CC = 1<br />
|V | ×<br />
|V |<br />
<br />
CC(vi) (5.7)<br />
i=1<br />
CC(vi) = 2 × |E(vj, vk)|<br />
Ki × (Ki − 1) : vj, vk ∈ neighbours(vi) ∧ Ki = |neighbours(vi)| (5.8)<br />
Weights were not considered in <strong>the</strong> construction <strong>of</strong> table 5.1. Since we have<br />
extracted four different types <strong>of</strong> synonymy, considering <strong>the</strong> POS <strong>of</strong> <strong>the</strong> connected<br />
items (nouns, verbs, adjectives and adverbs, see more details in section 4.2.4), in <strong>the</strong><br />
same table, we present <strong>the</strong> properties <strong>of</strong> <strong>the</strong> four synonymy networks independently.<br />
POS |V | |E| deg(N) |Vlcs| CClcs<br />
Nouns 39,355 57,813 2.94 25,828 0.14<br />
Verbs 11,502 28,282 4.92 10,631 0.17<br />
Adjectives 15,260 27,040 3.54 11,006 0.16<br />
Adverbs 2,028 2,206 2.52 1,437 0.10<br />
Table 5.1: Properties <strong>of</strong> <strong>the</strong> synonymy networks.