24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

104 Chapter 6. Thesaurus Enrichment<br />

POS<br />

Noun Verb Adjective<br />

Synpairs 61,025 28,895 34,844<br />

In TeP 15,183 (22.5%) 13,891 (48.1%) 11,930 (34.2%)<br />

|C| = 0 14,902 (22.1%) 615 (2.1%) 2,659 (7.6%)<br />

|C| = 1 8,902 (13.2%) 960 (3.3%) 3,365 (9.7%)<br />

|C| > 1 28,414 (42.2%) 13,429 (46.6%) 16,890 (48.5%)<br />

|C| 4.30 8.49 4.34<br />

Table 6.4: Coverage <strong>of</strong> <strong>the</strong> synpairs by TeP.<br />

where synpairs have a second chance <strong>of</strong> being assigned to a synset, this time using<br />

<strong>the</strong> same similarity measure, but with a higher threshold, σ = 0.35. The previous<br />

value obtained <strong>the</strong> best precision in <strong>the</strong> mode All and, once again, against all human<br />

references. The second iteration intends to integrate unassigned synpairs, in which,<br />

after <strong>the</strong> first iteration, <strong>the</strong>re is high confidence on <strong>the</strong> assignment to a synset.<br />

After <strong>the</strong> assignment stage, 37,767 noun, 14,459 verb and 20,310 adjective synpairs<br />

were assigned to, at least, one TeP synset. Of those, respectively 35,247, 14,246<br />

and 19,595 were assigned during <strong>the</strong> first iteration and 2,520, 213 and 715 during<br />

<strong>the</strong> second. Table 6.5 presents examples <strong>of</strong> real assignments and <strong>the</strong> iteration where<br />

<strong>the</strong>y were accomplished 2 .<br />

It. Synpair Synset<br />

1 st {alimentação, mantença} {sustento, alimento, mantimento, alimentação}<br />

1 st {escravizar, servilizar} {oprimir, tiranizar, escravizar, esmagar}<br />

1 st {permanente, inextinguível} {durador, duradoiro, duradouro, durável, permanente,<br />

perdurável}<br />

2 nd {cortadura, cortadela} {golpe, cisão, cortadela, rasgue, corte, incisura, rasgo,<br />

cortadura, incisão}<br />

2 nd {reificar, substancializar} {realizar, coisificar, efetivar, efeituar, consumar, efectivar,<br />

efetuar, concretizar, reificar, hipostasiar, substantificar}<br />

2 nd {encorajante, entusiasmante} {empolgante, entusiasmante, galvanizante, galvanizador}<br />

Table 6.5: Examples <strong>of</strong> assignments.<br />

6.4.3 Clustering for new synsets<br />

In order to discover new synsets, <strong>the</strong> clustering procedure in section 6.3 was applied<br />

to <strong>the</strong> remaining synpairs, with θ = 0.5. Before clustering, we analysed some properties<br />

<strong>of</strong> <strong>the</strong> synonymy networks <strong>the</strong>y form. After clustering, some <strong>of</strong> <strong>the</strong> obtained<br />

results were evaluated manually.<br />

The discovered clusters were integrated in <strong>the</strong> enriched TeP, following <strong>the</strong> integration<br />

procedure described in section 6.3, using a threshold µ = 0.5, empirically<br />

defined. Not many synsets were however merged. More precisely 81 noun, 16 verb<br />

and 29 adjective clusters were merged to existing synsets. The rest <strong>of</strong> <strong>the</strong> clusters<br />

were added as new synsets.<br />

2 Intentionally, no translations are provided because, if translated, most <strong>of</strong> <strong>the</strong> provided examples<br />

would not capture <strong>the</strong> essence <strong>of</strong> this task.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!