Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
104 Chapter 6. Thesaurus Enrichment<br />
POS<br />
Noun Verb Adjective<br />
Synpairs 61,025 28,895 34,844<br />
In TeP 15,183 (22.5%) 13,891 (48.1%) 11,930 (34.2%)<br />
|C| = 0 14,902 (22.1%) 615 (2.1%) 2,659 (7.6%)<br />
|C| = 1 8,902 (13.2%) 960 (3.3%) 3,365 (9.7%)<br />
|C| > 1 28,414 (42.2%) 13,429 (46.6%) 16,890 (48.5%)<br />
|C| 4.30 8.49 4.34<br />
Table 6.4: Coverage <strong>of</strong> <strong>the</strong> synpairs by TeP.<br />
where synpairs have a second chance <strong>of</strong> being assigned to a synset, this time using<br />
<strong>the</strong> same similarity measure, but with a higher threshold, σ = 0.35. The previous<br />
value obtained <strong>the</strong> best precision in <strong>the</strong> mode All and, once again, against all human<br />
references. The second iteration intends to integrate unassigned synpairs, in which,<br />
after <strong>the</strong> first iteration, <strong>the</strong>re is high confidence on <strong>the</strong> assignment to a synset.<br />
After <strong>the</strong> assignment stage, 37,767 noun, 14,459 verb and 20,310 adjective synpairs<br />
were assigned to, at least, one TeP synset. Of those, respectively 35,247, 14,246<br />
and 19,595 were assigned during <strong>the</strong> first iteration and 2,520, 213 and 715 during<br />
<strong>the</strong> second. Table 6.5 presents examples <strong>of</strong> real assignments and <strong>the</strong> iteration where<br />
<strong>the</strong>y were accomplished 2 .<br />
It. Synpair Synset<br />
1 st {alimentação, mantença} {sustento, alimento, mantimento, alimentação}<br />
1 st {escravizar, servilizar} {oprimir, tiranizar, escravizar, esmagar}<br />
1 st {permanente, inextinguível} {durador, duradoiro, duradouro, durável, permanente,<br />
perdurável}<br />
2 nd {cortadura, cortadela} {golpe, cisão, cortadela, rasgue, corte, incisura, rasgo,<br />
cortadura, incisão}<br />
2 nd {reificar, substancializar} {realizar, coisificar, efetivar, efeituar, consumar, efectivar,<br />
efetuar, concretizar, reificar, hipostasiar, substantificar}<br />
2 nd {encorajante, entusiasmante} {empolgante, entusiasmante, galvanizante, galvanizador}<br />
Table 6.5: Examples <strong>of</strong> assignments.<br />
6.4.3 Clustering for new synsets<br />
In order to discover new synsets, <strong>the</strong> clustering procedure in section 6.3 was applied<br />
to <strong>the</strong> remaining synpairs, with θ = 0.5. Before clustering, we analysed some properties<br />
<strong>of</strong> <strong>the</strong> synonymy networks <strong>the</strong>y form. After clustering, some <strong>of</strong> <strong>the</strong> obtained<br />
results were evaluated manually.<br />
The discovered clusters were integrated in <strong>the</strong> enriched TeP, following <strong>the</strong> integration<br />
procedure described in section 6.3, using a threshold µ = 0.5, empirically<br />
defined. Not many synsets were however merged. More precisely 81 noun, 16 verb<br />
and 29 adjective clusters were merged to existing synsets. The rest <strong>of</strong> <strong>the</strong> clusters<br />
were added as new synsets.<br />
2 Intentionally, no translations are provided because, if translated, most <strong>of</strong> <strong>the</strong> provided examples<br />
would not capture <strong>the</strong> essence <strong>of</strong> this task.