Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
110 Chapter 6. Thesaurus Enrichment<br />
canjica, raposa, garrana, raposeira, cartola, cachorra, entusiasmo, carpanta, piteira, borracheira,<br />
cabeleira, carrocha, pifo, camoeca, marta, cachaceira, zangurriana, verniz, carrada<br />
• patamaz, boca-aberta, imbecil, lucas, malhadeiro, orate, zé-cuecas, lerdaço, tantã, boleima,<br />
babão, jato, zambana, badó, ânsar, bolônio, chapetão, parvalhão, haule, papa-moscas,<br />
lerdo, patau, sànona, perturbado, possidónio, babaquara, tolo, galafura, babuíno,<br />
zângano, inepto, badana, cabaça, andor, pax-vóbis, idiota, pascoal-bailão, sandeu, asneirão,<br />
zé, capadócio, calino, doudivanas, pasguate, parreco, babanca, palerma, molusco,<br />
parrana, moco, ansarinho, bajoujo, burro, truão, estulto, pexote, maninelo, lérias, banana,<br />
banazola, patego, bobo, estúpido, asno, sonso, ignorante, troixa, otário, simplório,<br />
pancrácio, patola, songo-mongo, toleirão, totó, burgesso, morcão, microcéfalo, patinho,<br />
bacoco, babancas, inhenha, pàteta, néscio, matias, parvoinho, mané, anastácio, manembro,<br />
tatamba, bobalhão, bertoldo, patavina, tonto, apedeuto, pachocho, ingênuo, bocoió, simplacheirão,<br />
jerico, zote, sebastião, lorpa, atónito, patacão, pato, parvoeirão, ingénuo, papalvo,<br />
pateta, tanso, cretino, bolónio, basbaque, mentecapto, pachola, apaixonado, pasmão,<br />
pascácio, tarola, trouxa, parvo, jumento, geta, arara, gato-bravo, pedaço-de-asno, parvajola,<br />
pacóvio, laparoto, crendeiro, loura<br />
In <strong>the</strong> previous synsets, <strong>the</strong> words <strong>of</strong> <strong>the</strong> original TeP synsets are presented in<br />
bold. O<strong>the</strong>r large synsets cover <strong>the</strong> concepts <strong>of</strong> a strong critic (100 words, including<br />
ralho, ensinadela, descasca, raspanete, descompostura), trickery (95 words, including<br />
peta, embuste, manha, barrete, tramóia), prostitute (73 words, including pega,<br />
menina, mulher-da-vida, meretriz, quenga, rameira, ...), a rascal/mischievous person<br />
(72 words, including, pulha, traste, gandulo, salafrário, patife, tratante, ...), and<br />
money (60 words, including pastel, massa, grana, guita, carcanhol). Also, on clustering,<br />
<strong>the</strong> only noun synset that includes more than 25 words refers to <strong>the</strong> concept<br />
<strong>of</strong> ’backside’ or ’butt’, and contains words such as bufante, padaria or peida. In<br />
TeP 2.0, <strong>the</strong> largest noun synset refers to a strike or aggression with some tool, and<br />
includes words as paulada, bastonada, marretada and pancada.<br />
Fur<strong>the</strong>rmore, <strong>the</strong> largest verb synset in <strong>the</strong> final <strong>the</strong>saurus means to mislead and<br />
contains words as embromar, ludibriar, embaciar, enrolar, vigarizar, or intrujar. The<br />
largest adjective synset denotes <strong>the</strong> quality <strong>of</strong> being shifty or deceitful and contains<br />
words as artificioso, matreiro, ardiloso, traiçoeiro, and sagaz.<br />
6.5 Discussion<br />
We have presented our work towards <strong>the</strong> enrichment <strong>of</strong> a <strong>the</strong>saurus, structured in<br />
synsets, with synonymy information automatically acquired from general language<br />
dictionaries. The four-step enrichment approach resulted in TRIP, a large Portuguese<br />
<strong>the</strong>saurus, obtained after enriching TeP, a Brazilian Portuguese <strong>the</strong>saurus,<br />
with information extracted from three Portuguese dictionaries and a smaller Portuguese<br />
<strong>the</strong>saurus. There are some similarities between <strong>the</strong> work presented here and<br />
<strong>the</strong> work <strong>of</strong> Tokunaga et al. (2001), for Japanese. However, our <strong>the</strong>saurus is simpler,<br />
as it does not contain taxonomic information. Fur<strong>the</strong>rmore, although it was used<br />
for Portuguese, <strong>the</strong> proposed approach might be adapted to o<strong>the</strong>r languages.<br />
Given that it is created using a handcrafted <strong>the</strong>saurus as a starting point, <strong>the</strong><br />
resulting <strong>the</strong>saurus is more reliable than <strong>the</strong> <strong>the</strong>saurus obtained in <strong>the</strong> previous<br />
chapter. The evaluation <strong>of</strong> <strong>the</strong> assignment procedure and <strong>of</strong> <strong>the</strong> obtained clusters<br />
also point that out, as <strong>the</strong>y have shown higher precisions. Therefore, in <strong>the</strong> construction<br />
<strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, <strong>the</strong> four-step approach, in this chapter, was used instead <strong>of</strong><br />
that described in <strong>the</strong> previous chapter, where synsets are discovered from scratch.