24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

84 Chapter 5. Synset Discovery<br />

tb-triples (synpairs) <strong>of</strong> CARTÃO (introduced in section 4), which establish a synonymy<br />

network. However, this work was done before <strong>the</strong> last version <strong>of</strong> CARTÃO was available. The presented results were obtained with relations extracted using<br />

<strong>the</strong> grammars <strong>of</strong> PAPEL 2.0 in DLP, DA (in a previous modernisation stage) and<br />

<strong>the</strong> 25th October 2010 Wiktionary.<strong>PT</strong> dump.<br />

5.3.1 Synonymy network data<br />

Before running <strong>the</strong> clustering procedure, we examined some properties <strong>of</strong> <strong>the</strong> synonymy<br />

network established by synpairs collected from <strong>the</strong> three dictionaries. Table<br />

5.1 shows <strong>the</strong> following properties, typically used to analyse graphs:<br />

• Number <strong>of</strong> nodes |V |, which corresponds to <strong>the</strong> number <strong>of</strong> unique lexical items<br />

in <strong>the</strong> synpair arguments.<br />

• Number <strong>of</strong> edges |E|, which corresponds to <strong>the</strong> number <strong>of</strong> unique synpairs.<br />

• Average degree (deg(N)) <strong>of</strong> <strong>the</strong> network (see expression 5.5), which is <strong>the</strong><br />

average number <strong>of</strong> edges per node.<br />

• Number <strong>of</strong> nodes <strong>of</strong> <strong>the</strong> largest connected sub-network |Vlcs|, which is <strong>the</strong><br />

largest group <strong>of</strong> nodes connected directly or indirectly in N.<br />

• Average clustering coefficient CClcs <strong>of</strong> <strong>the</strong> largest connected sub-network,<br />

which measures <strong>the</strong> degree to which nodes tend to cluster toge<strong>the</strong>r as a value<br />

in [0-1] (see expression 5.7). In random graphs, this coefficient is close to 0.<br />

The local clustering coefficient CC(vi) (see expression 5.8) <strong>of</strong> a node vi quantifies<br />

how connected its neighbours are.<br />

deg(N) = 1<br />

|V | ×<br />

|V | <br />

i=1<br />

deg(vi) : vi ∈ V (5.5) deg(vi) = |E(vi, vk)| : vk ∈ V (5.6)<br />

CC = 1<br />

|V | ×<br />

|V |<br />

<br />

CC(vi) (5.7)<br />

i=1<br />

CC(vi) = 2 × |E(vj, vk)|<br />

Ki × (Ki − 1) : vj, vk ∈ neighbours(vi) ∧ Ki = |neighbours(vi)| (5.8)<br />

Weights were not considered in <strong>the</strong> construction <strong>of</strong> table 5.1. Since we have<br />

extracted four different types <strong>of</strong> synonymy, considering <strong>the</strong> POS <strong>of</strong> <strong>the</strong> connected<br />

items (nouns, verbs, adjectives and adverbs, see more details in section 4.2.4), in <strong>the</strong><br />

same table, we present <strong>the</strong> properties <strong>of</strong> <strong>the</strong> four synonymy networks independently.<br />

POS |V | |E| deg(N) |Vlcs| CClcs<br />

Nouns 39,355 57,813 2.94 25,828 0.14<br />

Verbs 11,502 28,282 4.92 10,631 0.17<br />

Adjectives 15,260 27,040 3.54 11,006 0.16<br />

Adverbs 2,028 2,206 2.52 1,437 0.10<br />

Table 5.1: Properties <strong>of</strong> <strong>the</strong> synonymy networks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!