24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.4. A large <strong>the</strong>saurus for Portuguese 105<br />

Properties <strong>of</strong> <strong>the</strong> synonymy networks<br />

In a similar fashion to what was done for <strong>the</strong> complete network (table 5.1), table 6.6<br />

contains <strong>the</strong> total number <strong>of</strong> nodes (|V |) and edges (|E|), and <strong>the</strong> average network<br />

degree (deg(N), computed according to expression 5.5). It contains as well <strong>the</strong><br />

number <strong>of</strong> sub-networks (Sub-nets), which are group <strong>of</strong> nodes connected directly<br />

or indirectly in N; <strong>the</strong> number <strong>of</strong> nodes <strong>of</strong> <strong>the</strong> largest and second largest subnetworks<br />

(|Vlcs| and |Vlcs2|); and <strong>the</strong> average clustering coefficient <strong>of</strong> <strong>the</strong> largest<br />

sub-network (CClcs, computed according to expression 5.7).<br />

From table 6.6, we notice that <strong>the</strong>se synonymy networks are significantly different<br />

from <strong>the</strong> original. First, <strong>the</strong>y are smaller, as <strong>the</strong>y only contain about half <strong>of</strong> <strong>the</strong><br />

nouns, one sixth <strong>of</strong> <strong>the</strong> verbs and one third <strong>of</strong> <strong>the</strong> adjectives. Second, <strong>the</strong>y have<br />

substantially lower degrees, and clustering coefficients close to 0, which means <strong>the</strong>y<br />

are less connected and do not tend to form clusters. Never<strong>the</strong>less, <strong>the</strong>y still have<br />

one large core sub-network and several smaller.<br />

This confirms that a simpler clustering algorithm is suitable for our purpose,<br />

especially because ambiguity is much lower and several clusters are already defined<br />

by complete small sub-networks. The noun network contains 4,470 sub-networks <strong>of</strong><br />

size 2 and 1,127 <strong>of</strong> size 3. These numbers are respectively 437 and 97 for verbs, and<br />

1,303 and 262 for adjectives.<br />

POS |V | |E| deg(N) Sub-nets |Vlcs| CClcs |Vlcs2|<br />

Noun 21,272 15,294 1.44 6,556 2,816 0.03 66<br />

Verb 1,807 1,197 1.32 614 153 0.00 29<br />

Adjective 4,695 3,050 1.30 1,743 169 0.02 50<br />

Table 6.6: Properties <strong>of</strong> <strong>the</strong> synonymy networks remaining after assignment.<br />

Clustering Examples<br />

Figures 6.2, 6.3 and 6.4 illustrate <strong>the</strong> result <strong>of</strong> clustering in three sub-networks. The<br />

first sub-network results in only one cluster, with several synonyms for someone<br />

who speaks Greek. The second and <strong>the</strong> third are divided into different clusters,<br />

represented by different shades <strong>of</strong> grey.<br />

In figure 6.3, <strong>the</strong> sub-network is divided in two different meanings <strong>of</strong> <strong>the</strong> verb<br />

’splash’, one <strong>of</strong> <strong>the</strong>m more abstract (esparrinhar), and <strong>the</strong> o<strong>the</strong>r done with <strong>the</strong><br />

feet or hands (bachicar), but three words may be used with both meanings. The<br />

meanings covered by <strong>the</strong> four clusters in figure 6.4 are, respectively: a person who<br />

gives moral qualities; a person who evangelises; a person who spreads ideas; and a<br />

person who is an active member <strong>of</strong> a cause.<br />

Evaluation <strong>of</strong> <strong>the</strong> clustering results<br />

In order to check if <strong>the</strong> algorithm described in section 6.3 is efficient, and to have an<br />

idea on <strong>the</strong> quality <strong>of</strong> <strong>the</strong> discovered clusters, <strong>the</strong>ir manual evaluation was performed.<br />

Once again, we had two judges classifying pairs <strong>of</strong> words, collected from <strong>the</strong> same<br />

synset, as synonymous or not. This kind <strong>of</strong> evaluation is easier and slightly less<br />

subjective than <strong>the</strong> evaluation <strong>of</strong> complete synsets. Fur<strong>the</strong>rmore, in section 5.3.5<br />

we reported similar results using both kinds <strong>of</strong> evaluation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!