110 Chapter 6. Thesaurus Enrichment canjica, raposa, garrana, raposeira, cartola, cachorra, entusiasmo, carpanta, piteira, borracheira, cabeleira, carrocha, pifo, camoeca, marta, cachaceira, zangurriana, verniz, carrada • patamaz, boca-aberta, imbecil, lucas, malhadeiro, orate, zé-cuecas, lerdaço, tantã, boleima, babão, jato, zambana, badó, ânsar, bolônio, chapetão, parvalhão, haule, papa-moscas, lerdo, patau, sànona, perturbado, possidónio, babaquara, tolo, galafura, babuíno, zângano, inepto, badana, cabaça, andor, pax-vóbis, idiota, pascoal-bailão, sandeu, asneirão, zé, capadócio, calino, doudivanas, pasguate, parreco, babanca, palerma, molusco, parrana, moco, ansarinho, bajoujo, burro, truão, estulto, pexote, maninelo, lérias, banana, banazola, patego, bobo, estúpido, asno, sonso, ignorante, troixa, otário, simplório, pancrácio, patola, songo-mongo, toleirão, totó, burgesso, morcão, microcéfalo, patinho, bacoco, babancas, inhenha, pàteta, néscio, matias, parvoinho, mané, anastácio, manembro, tatamba, bobalhão, bertoldo, patavina, tonto, apedeuto, pachocho, ingênuo, bocoió, simplacheirão, jerico, zote, sebastião, lorpa, atónito, patacão, pato, parvoeirão, ingénuo, papalvo, pateta, tanso, cretino, bolónio, basbaque, mentecapto, pachola, apaixonado, pasmão, pascácio, tarola, trouxa, parvo, jumento, geta, arara, gato-bravo, pedaço-de-asno, parvajola, pacóvio, laparoto, crendeiro, loura In <strong>the</strong> previous synsets, <strong>the</strong> words <strong>of</strong> <strong>the</strong> original TeP synsets are presented in bold. O<strong>the</strong>r large synsets cover <strong>the</strong> concepts <strong>of</strong> a strong critic (100 words, including ralho, ensinadela, descasca, raspanete, descompostura), trickery (95 words, including peta, embuste, manha, barrete, tramóia), prostitute (73 words, including pega, menina, mulher-da-vida, meretriz, quenga, rameira, ...), a rascal/mischievous person (72 words, including, pulha, traste, gandulo, salafrário, patife, tratante, ...), and money (60 words, including pastel, massa, grana, guita, carcanhol). Also, on clustering, <strong>the</strong> only noun synset that includes more than 25 words refers to <strong>the</strong> concept <strong>of</strong> ’backside’ or ’butt’, and contains words such as bufante, padaria or peida. In TeP 2.0, <strong>the</strong> largest noun synset refers to a strike or aggression with some tool, and includes words as paulada, bastonada, marretada and pancada. Fur<strong>the</strong>rmore, <strong>the</strong> largest verb synset in <strong>the</strong> final <strong>the</strong>saurus means to mislead and contains words as embromar, ludibriar, embaciar, enrolar, vigarizar, or intrujar. The largest adjective synset denotes <strong>the</strong> quality <strong>of</strong> being shifty or deceitful and contains words as artificioso, matreiro, ardiloso, traiçoeiro, and sagaz. 6.5 Discussion We have presented our work towards <strong>the</strong> enrichment <strong>of</strong> a <strong>the</strong>saurus, structured in synsets, with synonymy information automatically acquired from general language dictionaries. The four-step enrichment approach resulted in TRIP, a large Portuguese <strong>the</strong>saurus, obtained after enriching TeP, a Brazilian Portuguese <strong>the</strong>saurus, with information extracted from three Portuguese dictionaries and a smaller Portuguese <strong>the</strong>saurus. There are some similarities between <strong>the</strong> work presented here and <strong>the</strong> work <strong>of</strong> Tokunaga et al. (2001), for Japanese. However, our <strong>the</strong>saurus is simpler, as it does not contain taxonomic information. Fur<strong>the</strong>rmore, although it was used for Portuguese, <strong>the</strong> proposed approach might be adapted to o<strong>the</strong>r languages. Given that it is created using a handcrafted <strong>the</strong>saurus as a starting point, <strong>the</strong> resulting <strong>the</strong>saurus is more reliable than <strong>the</strong> <strong>the</strong>saurus obtained in <strong>the</strong> previous chapter. The evaluation <strong>of</strong> <strong>the</strong> assignment procedure and <strong>of</strong> <strong>the</strong> obtained clusters also point that out, as <strong>the</strong>y have shown higher precisions. Therefore, in <strong>the</strong> construction <strong>of</strong> <strong>Onto</strong>.<strong>PT</strong>, <strong>the</strong> four-step approach, in this chapter, was used instead <strong>of</strong> that described in <strong>the</strong> previous chapter, where synsets are discovered from scratch.
6.5. Discussion 111 Ano<strong>the</strong>r contribution <strong>of</strong> this part <strong>of</strong> <strong>the</strong> work is that TeP, originally made for Brazilian Portuguese, is enriched with words from dictionaries whose entries contain, mainly 3 , words from European Portuguese. Therefore, besides being larger, <strong>the</strong> new <strong>the</strong>saurus has a better coverage <strong>of</strong> European Portuguese than TeP. Also, once again due to its public domain character, <strong>the</strong> resulting <strong>the</strong>saurus is ano<strong>the</strong>r suitable alternative to replace OpenThesaurus.<strong>PT</strong> as <strong>the</strong> <strong>the</strong>saurus <strong>of</strong> <strong>the</strong> OpenOffice word processor. One limitation <strong>of</strong> <strong>the</strong> work presented here is <strong>the</strong> amount <strong>of</strong> observation labour required to select <strong>the</strong> best assignment settings. An alternative would be to develop a procedure to learn automatically <strong>the</strong> best measures and thresholds for associating a synpair to a synset. Given that we already have a small gold resource, a supervised learning approach, would suit this purpose. A simple linear classifier, such as a perceptron (Rosenblatt, 1958) would probably be enough to, given a set <strong>of</strong> labelled correct and incorrect examples for each assignment, learn <strong>the</strong> best threshold. This will be devised as future work. Also, in order to get more reliable results, <strong>the</strong> gold resource should as well be augmented. As it currently contains only nouns, in <strong>the</strong> future, especially special attention should be given to <strong>the</strong> inclusion <strong>of</strong> verbs and adjectives. 3 Wiktionary.<strong>PT</strong> covers all variants <strong>of</strong> Portuguese, and PAPEL contains a minority <strong>of</strong> words in o<strong>the</strong>r variants <strong>of</strong> Portuguese, including Brazilian, Angolan and Mozambican.
- Page 1:
PhD Thesis Doctoral Program in Info
- Page 5:
Preface About six years ago, almost
- Page 9 and 10:
Resumo Não há grandes dúvidas qu
- Page 11 and 12:
Contents Chapter 1: Introduction .
- Page 13:
8.2.1 Semantic Web model . . . . .
- Page 16 and 17:
6.1 Illustrative synonymy network.
- Page 18 and 19:
6.3 Evaluation against intersection
- Page 21 and 22:
Chapter 1 Introduction A substantia
- Page 23 and 24:
1.2. Approach 5 • They are not bu
- Page 25 and 26:
1.4. Outline of the thesis 7 which
- Page 27 and 28:
Chapter 2 Background Knowledge The
- Page 29 and 30:
2.1. Lexical Semantics 11 that, in
- Page 31 and 32:
2.1. Lexical Semantics 13 Meronymy
- Page 33 and 34:
2.2. Lexical Knowledge Formalisms a
- Page 35 and 36:
2.2. Lexical Knowledge Formalisms a
- Page 37 and 38:
2.2. Lexical Knowledge Formalisms a
- Page 39 and 40:
2.3. Information Extraction from Te
- Page 41 and 42:
2.3. Information Extraction from Te
- Page 43:
2.4. Remarks on this section 25 usi
- Page 46 and 47:
28 Chapter 3. Related Work in group
- Page 48 and 49:
30 Chapter 3. Related Work ple rela
- Page 50 and 51:
32 Chapter 3. Related Work knowledg
- Page 52 and 53:
34 Chapter 3. Related Work the ELRA
- Page 54 and 55:
36 Chapter 3. Related Work resource
- Page 56 and 57:
38 Chapter 3. Related Work English
- Page 58 and 59:
40 Chapter 3. Related Work of super
- Page 60 and 61:
42 Chapter 3. Related Work • part
- Page 62 and 63:
44 Chapter 3. Related Work LSIE fro
- Page 64 and 65:
46 Chapter 3. Related Work modifier
- Page 66 and 67:
48 Chapter 3. Related Work 6. {,}
- Page 68 and 69:
50 Chapter 3. Related Work 1. Extra
- Page 70 and 71:
52 Chapter 3. Related Work Due to t
- Page 72 and 73:
54 Chapter 3. Related Work comparis
- Page 74 and 75:
56 Chapter 3. Related Work creation
- Page 76 and 77:
58 Chapter 4. Acquisition of Semant
- Page 78 and 79: 60 Chapter 4. Acquisition of Semant
- Page 80 and 81: 62 Chapter 4. Acquisition of Semant
- Page 82 and 83: 64 Chapter 4. Acquisition of Semant
- Page 84 and 85: 66 Chapter 4. Acquisition of Semant
- Page 86 and 87: 68 Chapter 4. Acquisition of Semant
- Page 88 and 89: 70 Chapter 4. Acquisition of Semant
- Page 90 and 91: 72 Chapter 4. Acquisition of Semant
- Page 92 and 93: 74 Chapter 4. Acquisition of Semant
- Page 94 and 95: 76 Chapter 4. Acquisition of Semant
- Page 96 and 97: 78 Chapter 4. Acquisition of Semant
- Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
- Page 100 and 101: 82 Chapter 5. Synset Discovery the
- Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
- Page 104 and 105: 86 Chapter 5. Synset Discovery cota
- Page 106 and 107: 88 Chapter 5. Synset Discovery θ W
- Page 108 and 109: 90 Chapter 5. Synset Discovery Tabl
- Page 110 and 111: 92 Chapter 5. Synset Discovery word
- Page 113 and 114: Chapter 6 Thesaurus Enrichment Gene
- Page 115 and 116: 6.1. Automatic Assignment of synpai
- Page 117 and 118: 6.2. Evaluation of the assignment p
- Page 119 and 120: 6.3. Clustering and integrating new
- Page 121 and 122: 6.4. A large thesaurus for Portugue
- Page 123 and 124: 6.4. A large thesaurus for Portugue
- Page 125 and 126: 6.4. A large thesaurus for Portugue
- Page 127: 6.4. A large thesaurus for Portugue
- Page 132 and 133: 114 Chapter 7. Moving from term-bas
- Page 134 and 135: 116 Chapter 7. Moving from term-bas
- Page 136 and 137: 118 Chapter 7. Moving from term-bas
- Page 138 and 139: 120 Chapter 7. Moving from term-bas
- Page 140 and 141: 122 Chapter 7. Moving from term-bas
- Page 142 and 143: 124 Chapter 7. Moving from term-bas
- Page 144 and 145: 126 Chapter 7. Moving from term-bas
- Page 146 and 147: 128 Chapter 7. Moving from term-bas
- Page 149 and 150: Chapter 8 Onto.PT: a lexical ontolo
- Page 151 and 152: 8.1. Overview 133 items inside a sy
- Page 153 and 154: 8.2. Access and Availability 135 no
- Page 155 and 156: 8.2. Access and Availability 137 Ex
- Page 157 and 158: 8.3. Evaluation 139 Figure 8.3: Ins
- Page 159 and 160: 8.3. Evaluation 141 the most reliab
- Page 161 and 162: 8.3. Evaluation 143 imation of the
- Page 163 and 164: 8.3. Evaluation 145 Relation parteD
- Page 165 and 166: 8.4. Using Onto.PT 147 • S: (n) a
- Page 167 and 168: 8.4. Using Onto.PT 149 todos os fun
- Page 169 and 170: 8.4. Using Onto.PT 151 In addition
- Page 171 and 172: 8.4. Using Onto.PT 153 based approa
- Page 173: 8.4. Using Onto.PT 155 Uma populaç
- Page 176 and 177: 158 Chapter 9. Final discussion 3.
- Page 178 and 179:
160 Chapter 9. Final discussion - G
- Page 180 and 181:
162 Chapter 9. Final discussion Any
- Page 183 and 184:
References Agichtein, E. and Gravan
- Page 185 and 186:
References 167 for storing and quer
- Page 187 and 188:
References 169 15th International C
- Page 189 and 190:
References 171 Symposium (STAIRS 20
- Page 191 and 192:
References 173 Hovy, E., Hermjakob,
- Page 193 and 194:
References 175 ACM, 38(11):39-41. M
- Page 195 and 196:
References 177 ACL Press. Partee, B
- Page 197 and 198:
References 179 Russell, S. and Norv
- Page 199 and 200:
References 181 Proceedings of 13th
- Page 201 and 202:
Appendix A Description of the extra
- Page 203 and 204:
• x propriedadeDeAlgoQueCausa y -
- Page 205:
• x antonimoAdjDe y Property - x
- Page 208 and 209:
190 Appendix B. Coverage of EuroWor
- Page 210 and 211:
192 Appendix B. Coverage of EuroWor
- Page 212:
194 Appendix B. Coverage of EuroWor