108 Chapter 6. Thesaurus Enrichment most ambiguous word (Max(senses)). On <strong>the</strong> synsets (table 6.9), we present <strong>the</strong>ir quantity (Total), <strong>the</strong>ir average size in terms <strong>of</strong> words (Avg(size)), <strong>the</strong> number <strong>of</strong> synsets <strong>of</strong> size 2 (size = 2) and size greater than 25 (size > 25) and, also, <strong>the</strong> size <strong>of</strong> <strong>the</strong> largest synset (max(size)). Thesaurus POS TeP 2.0 1 st iteration 2 nd iteration Clusters TRIP Words Total Ambiguous Avg(senses) Max(senses) Noun 17,149 5,802 1.71 20 Verb 8,280 4,680 2.69 50 Adjective 14,568 3,730 1.46 19 Adverb 1,095 227 1.30 11 Noun 28,693 11,794 1.98 22 Verb 11,272 6,357 2.85 50 Adjective 19,148 7,149 1.85 21 Adverb 1,865 499 1.40 12 Noun 29,223 11,988 1.99 22 Verb 11,301 6,374 2.86 50 Adjective 19,291 7,213 1.85 21 Adverb 1,914 513 1.40 12 Noun 21,126 2,196 1.14 5 Verb 1,801 177 1.13 4 Adjective 4,687 359 1.10 5 Adverb 743 89 1.15 3 Noun 45,457 15,392 1.80 22 Verb 11,924 6,607 2.87 52 Adjective 22,316 7,782 1.83 22 Adverb 2,488 694 1.42 12 Table 6.8: Thesauri comparison in terms <strong>of</strong> words. After <strong>the</strong> assignments, <strong>the</strong> number <strong>of</strong> words grows and <strong>the</strong> number <strong>of</strong> synsets becomes slightly lower. This might seem strange, but as some synsets in TeP are very similar to each o<strong>the</strong>r, after <strong>the</strong> assignments, <strong>the</strong>y become <strong>the</strong> same synset, and one <strong>of</strong> <strong>the</strong>m is discarded. Fur<strong>the</strong>rmore, as expected, ambiguity becomes higher at this stage. As <strong>the</strong>re is <strong>the</strong> same number <strong>of</strong> synsets, but more words, some words are added to more than one synset. And <strong>the</strong> synsets also become larger, as <strong>the</strong>y are augmented. The <strong>the</strong>saurus obtained after clustering is smaller and much less ambiguous than <strong>the</strong> o<strong>the</strong>rs. Besides <strong>the</strong> high threshold (θ = 0.5), this happens because <strong>the</strong> words not covered by TeP tend to be less frequent, which are typically more specific and thus less ambiguous. Never<strong>the</strong>less, for nouns, <strong>the</strong>re is still a synset with 31 words. The words in TRIP are slightly more ambiguous than in TeP and <strong>the</strong> synsets <strong>of</strong> TRIP are also larger than TeP’s. It is clear that TRIP is much larger than TeP. It contains about two and a half times more noun and adverb lexical items, about 3,500 more verbs and 8,000 more adjectives. The highest number <strong>of</strong> synsets means that <strong>the</strong> new <strong>the</strong>saurus is broader also in terms <strong>of</strong> covered natural language concepts. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> new <strong>the</strong>saurus is more ambiguous and has larger synsets. For instance, it has almost 600 synsets with more than 25 words, which can be seen as too large for being practical (Borin and Forsberg, 2010). TeP has just 66 <strong>of</strong> those synsets. Never<strong>the</strong>less, we have looked to <strong>the</strong> largest synsets <strong>of</strong> TRIP and noticed that most <strong>of</strong> <strong>the</strong>m are well-formed as <strong>the</strong>y only contain synonymous words.
6.4. A large <strong>the</strong>saurus for Portuguese 109 Thesaurus POS TeP 2.0 1 st iteration 2 nd iteration Clusters TRIP Largest synsets Synsets Total Avg(size) size = 2 size > 25 max(size) Noun 8,254 3.56 3,083 0 21 Verb 3,899 5.71 907 48 53 Adjective 6,062 3.5 3,032 18 43 Adverb 497 2.87 258 0 9 Noun 8,126 7.00 1,227 203 125 Verb 3,639 8.84 406 189 131 Adjective 5,945 5.04 1,923 89 87 Adverb 494 5.28 103 1 27 Noun 8,126 7.15 1,227 225 129 Verb 3,639 8.87 406 193 132 Adjective 5,914 6.05 1,806 161 117 Adverb 494 5.41 103 1 27 Noun 8,879 2.70 4,765 1 31 Verb 801 2.54 467 0 5 Adjective 2,063 2.50 1,325 0 8 Adverb 319 2.68 167 0 7 Noun 16,936 4.84 5,986 226 131 Verb 4,424 7.75 873 193 132 Adjective 7,948 5.14 3,127 161 117 Adverb 813 4.34 270 1 27 Table 6.9: Thesauri comparison in terms <strong>of</strong> synsets. Out <strong>of</strong> curiosity, <strong>the</strong> largest noun synsets <strong>of</strong> TRIP refer to concepts that have several figurative and (most <strong>of</strong> <strong>the</strong> times) slang synonyms, typically used as insults. For instance, <strong>the</strong> following are <strong>the</strong> three largest noun synsets, which denote, respectively, disorder/confusion, alcoholic intoxication, and an imbecile/stupid person: • furdúncio, aldrabice, matalotagem, fuzuê, rondão, desfeita, vergonha, sobresalto, salada russa, borogodó, latomia, trapizarga, tranquibérnia, alarma, debandada, atabalhoação, siricutico, desorganização, miscelânea, turvação, sarapatel, valverde, equívoco, recacau, canvanza, caravançarai, bafafá, atarantação, baderna, baralha, baralhada, cancaburra, rebuliço, salgalhada, barafunda, abstrusidade, mistifório, assarapantamento, rebúmbio, trapalhice, brenha, roldão, sarrabulhada, caos, dédalo, estrilho, revolvimento, enovelamento, trapalhada, barulho, kanvuanza, javardice, embrolho, desordem, desmanho, vasqueiro, forrobodó, garabulha, timaca, pastelada, zona, anarquia, confusão, rodilhão, floresta, bolo, complicação, feijoada, remexida, amalgamação, sarilho, saricoté, atrapalhação, feira, foguete, marafunda, salsada, cambulha, sarrabulho, desarranjo, pipoco, atropelamento, mixórdia, arranca-rabo, babel, inferno, pessegada, imbróglio, marmelada, choldraboldra, ensalsada, vuvu, bambá, caldeirada, mastigada, maka, ataranto, encrequilha, baixaria, sururu, cegarrega, zorra, salada, atabalhoamento, mexida, badanal, escangalho, precipitação, chirinola, enredo, vira-teimão, rolo, cu-de-boi, desarrumação, embrulhada, indistincção, estricote, envolta, salseiro, enredia, mexedura, atropelo, bagunça, fula-fula, misturada, desconcerto, labirinto, cambulhada, cafarnaum • torcida, embriagamento, veneno, mona, zurca, trapisonda, lontra, rosca, perua, raposada, rola, tertúlia, carraspana, peleira, pizorga, cabra, chuva, tachada, caroça, ardina, girgolina, égua, carrega, zerenamora, rasca, touca, venena, gardunho, ema, porre, ebriez, carapanta, chiba, ebriedade, bico, inebriamento, bebedeira, carrapata, penca, taçada, canja, garça, ganso, tortelia, turca, cabrita, mela, resina, senisga, bebedice, bezana, vinhaça, zangurrina, bêbeda, bibra, borrachice, zuca, coca, torta, doninha, piela, graxa, trabuzana, água, cegonha, gateira, bicancra, samatra, galinhola, gata, pala, ganza, pifão, bode, cobra, prego, zola, nêspera, narda, parrascana, vinho, gardinhola, tropecina, embriaguez, cardina, tiorga, temulência, narceja, pisorga, grossura, dosa, trovoada, carneira, perunca, bruega,
- Page 1:
PhD Thesis Doctoral Program in Info
- Page 5:
Preface About six years ago, almost
- Page 9 and 10:
Resumo Não há grandes dúvidas qu
- Page 11 and 12:
Contents Chapter 1: Introduction .
- Page 13:
8.2.1 Semantic Web model . . . . .
- Page 16 and 17:
6.1 Illustrative synonymy network.
- Page 18 and 19:
6.3 Evaluation against intersection
- Page 21 and 22:
Chapter 1 Introduction A substantia
- Page 23 and 24:
1.2. Approach 5 • They are not bu
- Page 25 and 26:
1.4. Outline of the thesis 7 which
- Page 27 and 28:
Chapter 2 Background Knowledge The
- Page 29 and 30:
2.1. Lexical Semantics 11 that, in
- Page 31 and 32:
2.1. Lexical Semantics 13 Meronymy
- Page 33 and 34:
2.2. Lexical Knowledge Formalisms a
- Page 35 and 36:
2.2. Lexical Knowledge Formalisms a
- Page 37 and 38:
2.2. Lexical Knowledge Formalisms a
- Page 39 and 40:
2.3. Information Extraction from Te
- Page 41 and 42:
2.3. Information Extraction from Te
- Page 43:
2.4. Remarks on this section 25 usi
- Page 46 and 47:
28 Chapter 3. Related Work in group
- Page 48 and 49:
30 Chapter 3. Related Work ple rela
- Page 50 and 51:
32 Chapter 3. Related Work knowledg
- Page 52 and 53:
34 Chapter 3. Related Work the ELRA
- Page 54 and 55:
36 Chapter 3. Related Work resource
- Page 56 and 57:
38 Chapter 3. Related Work English
- Page 58 and 59:
40 Chapter 3. Related Work of super
- Page 60 and 61:
42 Chapter 3. Related Work • part
- Page 62 and 63:
44 Chapter 3. Related Work LSIE fro
- Page 64 and 65:
46 Chapter 3. Related Work modifier
- Page 66 and 67:
48 Chapter 3. Related Work 6. {,}
- Page 68 and 69:
50 Chapter 3. Related Work 1. Extra
- Page 70 and 71:
52 Chapter 3. Related Work Due to t
- Page 72 and 73:
54 Chapter 3. Related Work comparis
- Page 74 and 75:
56 Chapter 3. Related Work creation
- Page 76 and 77: 58 Chapter 4. Acquisition of Semant
- Page 78 and 79: 60 Chapter 4. Acquisition of Semant
- Page 80 and 81: 62 Chapter 4. Acquisition of Semant
- Page 82 and 83: 64 Chapter 4. Acquisition of Semant
- Page 84 and 85: 66 Chapter 4. Acquisition of Semant
- Page 86 and 87: 68 Chapter 4. Acquisition of Semant
- Page 88 and 89: 70 Chapter 4. Acquisition of Semant
- Page 90 and 91: 72 Chapter 4. Acquisition of Semant
- Page 92 and 93: 74 Chapter 4. Acquisition of Semant
- Page 94 and 95: 76 Chapter 4. Acquisition of Semant
- Page 96 and 97: 78 Chapter 4. Acquisition of Semant
- Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
- Page 100 and 101: 82 Chapter 5. Synset Discovery the
- Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
- Page 104 and 105: 86 Chapter 5. Synset Discovery cota
- Page 106 and 107: 88 Chapter 5. Synset Discovery θ W
- Page 108 and 109: 90 Chapter 5. Synset Discovery Tabl
- Page 110 and 111: 92 Chapter 5. Synset Discovery word
- Page 113 and 114: Chapter 6 Thesaurus Enrichment Gene
- Page 115 and 116: 6.1. Automatic Assignment of synpai
- Page 117 and 118: 6.2. Evaluation of the assignment p
- Page 119 and 120: 6.3. Clustering and integrating new
- Page 121 and 122: 6.4. A large thesaurus for Portugue
- Page 123 and 124: 6.4. A large thesaurus for Portugue
- Page 125: 6.4. A large thesaurus for Portugue
- Page 129: 6.5. Discussion 111 Another contrib
- Page 132 and 133: 114 Chapter 7. Moving from term-bas
- Page 134 and 135: 116 Chapter 7. Moving from term-bas
- Page 136 and 137: 118 Chapter 7. Moving from term-bas
- Page 138 and 139: 120 Chapter 7. Moving from term-bas
- Page 140 and 141: 122 Chapter 7. Moving from term-bas
- Page 142 and 143: 124 Chapter 7. Moving from term-bas
- Page 144 and 145: 126 Chapter 7. Moving from term-bas
- Page 146 and 147: 128 Chapter 7. Moving from term-bas
- Page 149 and 150: Chapter 8 Onto.PT: a lexical ontolo
- Page 151 and 152: 8.1. Overview 133 items inside a sy
- Page 153 and 154: 8.2. Access and Availability 135 no
- Page 155 and 156: 8.2. Access and Availability 137 Ex
- Page 157 and 158: 8.3. Evaluation 139 Figure 8.3: Ins
- Page 159 and 160: 8.3. Evaluation 141 the most reliab
- Page 161 and 162: 8.3. Evaluation 143 imation of the
- Page 163 and 164: 8.3. Evaluation 145 Relation parteD
- Page 165 and 166: 8.4. Using Onto.PT 147 • S: (n) a
- Page 167 and 168: 8.4. Using Onto.PT 149 todos os fun
- Page 169 and 170: 8.4. Using Onto.PT 151 In addition
- Page 171 and 172: 8.4. Using Onto.PT 153 based approa
- Page 173: 8.4. Using Onto.PT 155 Uma populaç
- Page 176 and 177:
158 Chapter 9. Final discussion 3.
- Page 178 and 179:
160 Chapter 9. Final discussion - G
- Page 180 and 181:
162 Chapter 9. Final discussion Any
- Page 183 and 184:
References Agichtein, E. and Gravan
- Page 185 and 186:
References 167 for storing and quer
- Page 187 and 188:
References 169 15th International C
- Page 189 and 190:
References 171 Symposium (STAIRS 20
- Page 191 and 192:
References 173 Hovy, E., Hermjakob,
- Page 193 and 194:
References 175 ACM, 38(11):39-41. M
- Page 195 and 196:
References 177 ACL Press. Partee, B
- Page 197 and 198:
References 179 Russell, S. and Norv
- Page 199 and 200:
References 181 Proceedings of 13th
- Page 201 and 202:
Appendix A Description of the extra
- Page 203 and 204:
• x propriedadeDeAlgoQueCausa y -
- Page 205:
• x antonimoAdjDe y Property - x
- Page 208 and 209:
190 Appendix B. Coverage of EuroWor
- Page 210 and 211:
192 Appendix B. Coverage of EuroWor
- Page 212:
194 Appendix B. Coverage of EuroWor