24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

110 Chapter 6. Thesaurus Enrichment<br />

canjica, raposa, garrana, raposeira, cartola, cachorra, entusiasmo, carpanta, piteira, borracheira,<br />

cabeleira, carrocha, pifo, camoeca, marta, cachaceira, zangurriana, verniz, carrada<br />

• patamaz, boca-aberta, imbecil, lucas, malhadeiro, orate, zé-cuecas, lerdaço, tantã, boleima,<br />

babão, jato, zambana, badó, ânsar, bolônio, chapetão, parvalhão, haule, papa-moscas,<br />

lerdo, patau, sànona, perturbado, possidónio, babaquara, tolo, galafura, babuíno,<br />

zângano, inepto, badana, cabaça, andor, pax-vóbis, idiota, pascoal-bailão, sandeu, asneirão,<br />

zé, capadócio, calino, doudivanas, pasguate, parreco, babanca, palerma, molusco,<br />

parrana, moco, ansarinho, bajoujo, burro, truão, estulto, pexote, maninelo, lérias, banana,<br />

banazola, patego, bobo, estúpido, asno, sonso, ignorante, troixa, otário, simplório,<br />

pancrácio, patola, songo-mongo, toleirão, totó, burgesso, morcão, microcéfalo, patinho,<br />

bacoco, babancas, inhenha, pàteta, néscio, matias, parvoinho, mané, anastácio, manembro,<br />

tatamba, bobalhão, bertoldo, patavina, tonto, apedeuto, pachocho, ingênuo, bocoió, simplacheirão,<br />

jerico, zote, sebastião, lorpa, atónito, patacão, pato, parvoeirão, ingénuo, papalvo,<br />

pateta, tanso, cretino, bolónio, basbaque, mentecapto, pachola, apaixonado, pasmão,<br />

pascácio, tarola, trouxa, parvo, jumento, geta, arara, gato-bravo, pedaço-de-asno, parvajola,<br />

pacóvio, laparoto, crendeiro, loura<br />

In <strong>the</strong> previous synsets, <strong>the</strong> words <strong>of</strong> <strong>the</strong> original TeP synsets are presented in<br />

bold. O<strong>the</strong>r large synsets cover <strong>the</strong> concepts <strong>of</strong> a strong critic (100 words, including<br />

ralho, ensinadela, descasca, raspanete, descompostura), trickery (95 words, including<br />

peta, embuste, manha, barrete, tramóia), prostitute (73 words, including pega,<br />

menina, mulher-da-vida, meretriz, quenga, rameira, ...), a rascal/mischievous person<br />

(72 words, including, pulha, traste, gandulo, salafrário, patife, tratante, ...), and<br />

money (60 words, including pastel, massa, grana, guita, carcanhol). Also, on clustering,<br />

<strong>the</strong> only noun synset that includes more than 25 words refers to <strong>the</strong> concept<br />

<strong>of</strong> ’backside’ or ’butt’, and contains words such as bufante, padaria or peida. In<br />

TeP 2.0, <strong>the</strong> largest noun synset refers to a strike or aggression with some tool, and<br />

includes words as paulada, bastonada, marretada and pancada.<br />

Fur<strong>the</strong>rmore, <strong>the</strong> largest verb synset in <strong>the</strong> final <strong>the</strong>saurus means to mislead and<br />

contains words as embromar, ludibriar, embaciar, enrolar, vigarizar, or intrujar. The<br />

largest adjective synset denotes <strong>the</strong> quality <strong>of</strong> being shifty or deceitful and contains<br />

words as artificioso, matreiro, ardiloso, traiçoeiro, and sagaz.<br />

6.5 Discussion<br />

We have presented our work towards <strong>the</strong> enrichment <strong>of</strong> a <strong>the</strong>saurus, structured in<br />

synsets, with synonymy information automatically acquired from general language<br />

dictionaries. The four-step enrichment approach resulted in TRIP, a large Portuguese<br />

<strong>the</strong>saurus, obtained after enriching TeP, a Brazilian Portuguese <strong>the</strong>saurus,<br />

with information extracted from three Portuguese dictionaries and a smaller Portuguese<br />

<strong>the</strong>saurus. There are some similarities between <strong>the</strong> work presented here and<br />

<strong>the</strong> work <strong>of</strong> Tokunaga et al. (2001), for Japanese. However, our <strong>the</strong>saurus is simpler,<br />

as it does not contain taxonomic information. Fur<strong>the</strong>rmore, although it was used<br />

for Portuguese, <strong>the</strong> proposed approach might be adapted to o<strong>the</strong>r languages.<br />

Given that it is created using a handcrafted <strong>the</strong>saurus as a starting point, <strong>the</strong><br />

resulting <strong>the</strong>saurus is more reliable than <strong>the</strong> <strong>the</strong>saurus obtained in <strong>the</strong> previous<br />

chapter. The evaluation <strong>of</strong> <strong>the</strong> assignment procedure and <strong>of</strong> <strong>the</strong> obtained clusters<br />

also point that out, as <strong>the</strong>y have shown higher precisions. Therefore, in <strong>the</strong> construction<br />

<strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, <strong>the</strong> four-step approach, in this chapter, was used instead <strong>of</strong><br />

that described in <strong>the</strong> previous chapter, where synsets are discovered from scratch.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!