Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

More documents

Recommendations

Info

78 Chapter 4. Acquisition of Semantic Relations where transitivity was applied to the synonymy relations of PAPEL, giving rise to some inconsistencies as the following: • queda synonym-of ruína ∧ queda synonym-of habilidade → ruína synonym-of habilidade The problem occurs because one sense of queda is the result of falling, while another means to have some skill. Therefore, combining those two, we obtain that ruína (ruin) is the same as habilidade (ability, skill), which are almost opposites. Nevertheless, since the beginning of the project PAPEL, our option was to build a lexical resource where lexical items were not divided into word senses. That early option relied on the following: • From a linguistic point of view, word senses are not discrete and cannot be separated with clear boundaries (Kilgarriff, 1996; Hirst, 2004). Sense division in dictionaries and lexical ontologies is most of the times artificial. • Following the previous point, the sense granularity in dictionaries and lexical ontologies is often different from lexicographer to lexicographer. As there is not a well-defined criteria for the division of meanings, word senses in different resources do not always match (Dolan, 1994; Peters et al., 1998). • Word sense disambiguation (WSD, see Navigli (2009b) for a survey) is the task of, given the context where a word occurs, selecting the most adequate of its senses from a sense inventory. However, the previous points confirm that WSD is an ill-defined task and is very dependent on the purpose (Wilks, 2000). • Dictionaries do not provide the sense corresponding to a word occurring in a definition. After the first version of PAPEL was released, Navigli (2009a) actually presented a method for disambiguating words in dictionary definitions. Still, given the aforementioned problems on WSD, the term-based structure of PAPEL was kept. • Finally, in natural language, the study of vagueness is as, or even more, important that studying ambiguity (see e.g. Santos (1997)). When we started to extract relations from other dictionaries (and thesauri), we confirmed that the senses of words occurring in more than one resource did not match for different resources. Moreover, not all definitions in Wiktionary.PT have a sense number and synonymy lists do not always indicate the corresponding synonymous sense. Since we are extracting information from more than one lexical resource, an alternative would be to align the word senses in different resources (represented as definitions in dictionaries or synsets in thesauri), as others did (e.g. Vossen et al. (2008); Henrich et al. (2012)). Still, given the aforementioned utility of a lexical resource as PAPEL, we decided to keep CARTÃO as a term-based resource. In the following chapters, we explain how the structure of CARTÃO can evolve to a resource that handles word senses. After the additional steps of the ECO approach, the result is Onto.PT, a resource structured in synsets. We recall that this approach is flexible in a way that it enables the construction (and further augmentation) of a wordnet, based on the integration of knowledge from multiple heterogeneous sources and, from this point, it does not require an additional analysis of the extraction context. The only requirement is that the initial information is represented as tb-triples, which is kind of a standard representation.
Chapter 5 Synset Discovery As referred in the previous chapter, a LKB structured in words, instead of concepts, does not handle lexical ambiguity and might lead to serious inconsistencies. To deal with that issue, wordnets are structured in synsets, which are groups of words sharing a common meaning and thus representing a concept. This chapter is about the discovery of synsets from a term-based LKB, which is the first step for moving towards a sense-aware resource. Since a synset groups words according to their synonymy, in this step, we only use the network established by the synonymy triples extracted from dictionaries. On the one hand, co-occurrence graphs extracted from corpora have shown to be useful for identifying not only synonymous words, but also word senses (Dorow, 2006). It should be mentioned that, in opposition to other kinds of relation, synonymous words share similar neighbourhoods, but may not co-occur frequently in corpora text (Dorow, 2006), which leads to few textual patterns connecting this kind of words. So, as referred in section 3.2.2, most of the works on synonymy (or near-synonymy) extraction from corpora rely on the application of mathematical models (e.g. Turney (2001)), including graphs, clustering algorithms, or both (e.g. Dorow (2006)). On the other hand, in synonymy networks extracted from dictionaries, clusters tend to express concepts (Gfeller et al., 2005) and can therefore be exploited for the establishement of synsets. Methods for improving the organisation of synonymy graphs, extracted from different resources, are presented by Navarro et al. (2009). As other authors noticed for PAPEL (Prestes et al., 2011), we confirmed that synonymy networks extracted from dictionaries connect more than half of the words by, at least, one path. Therefore, as others did for discovering new concepts from text (e.g. Lin and Pantel (2002)), we used a (graph) clustering algorithm on our synonymy networks. This kind of work is related to WSD. More specifically, it can be seen as word sense induction (WSI, Navigli (2012)) as it discovers possible concepts of a word, without exploiting an existing sense inventory. As discussed in section 4.3, from a linguistic point of view, word senses are not discrete, so their representation as crisp objects does not reflect the human language. A more realistic approach for coping with this fact is to represent synsets as models of uncertainty, such as fuzzy sets, to handle word senses and natural language concepts. Our clustering algorithm can be used for the discovery of fuzzy synsets. The fuzzy membership of a word in a synset can be interpreted as the confidence level about using this word to indicate the meaning of the synset.
Page 1:
PhD Thesis Doctoral Program in Info
Page 5:
Preface About six years ago, almost
Page 9 and 10:
Resumo Não há grandes dúvidas qu
Page 11 and 12:
Contents Chapter 1: Introduction .
Page 13:
8.2.1 Semantic Web model . . . . .
Page 16 and 17:
6.1 Illustrative synonymy network.
Page 18 and 19:
6.3 Evaluation against intersection
Page 21 and 22:
Chapter 1 Introduction A substantia
Page 23 and 24:
1.2. Approach 5 • They are not bu
Page 25 and 26:
1.4. Outline of the thesis 7 which
Page 27 and 28:
Chapter 2 Background Knowledge The
Page 29 and 30:
2.1. Lexical Semantics 11 that, in
Page 31 and 32:
2.1. Lexical Semantics 13 Meronymy
Page 33 and 34:
2.2. Lexical Knowledge Formalisms a
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
2.3. Information Extraction from Te
Page 41 and 42:
2.3. Information Extraction from Te
Page 43:
2.4. Remarks on this section 25 usi
Page 46 and 47: 28 Chapter 3. Related Work in group
Page 48 and 49: 30 Chapter 3. Related Work ple rela
Page 50 and 51: 32 Chapter 3. Related Work knowledg
Page 52 and 53: 34 Chapter 3. Related Work the ELRA
Page 54 and 55: 36 Chapter 3. Related Work resource
Page 56 and 57: 38 Chapter 3. Related Work English
Page 58 and 59: 40 Chapter 3. Related Work of super
Page 60 and 61: 42 Chapter 3. Related Work • part
Page 62 and 63: 44 Chapter 3. Related Work LSIE fro
Page 64 and 65: 46 Chapter 3. Related Work modifier
Page 66 and 67: 48 Chapter 3. Related Work 6. {,}
Page 68 and 69: 50 Chapter 3. Related Work 1. Extra
Page 70 and 71: 52 Chapter 3. Related Work Due to t
Page 72 and 73: 54 Chapter 3. Related Work comparis
Page 74 and 75: 56 Chapter 3. Related Work creation
Page 76 and 77: 58 Chapter 4. Acquisition of Semant
Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
Page 100 and 101: 82 Chapter 5. Synset Discovery the
Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
Page 104 and 105: 86 Chapter 5. Synset Discovery cota
Page 106 and 107: 88 Chapter 5. Synset Discovery θ W
Page 108 and 109: 90 Chapter 5. Synset Discovery Tabl
Page 110 and 111: 92 Chapter 5. Synset Discovery word
Page 113 and 114: Chapter 6 Thesaurus Enrichment Gene
Page 115 and 116: 6.1. Automatic Assignment of synpai
Page 117 and 118: 6.2. Evaluation of the assignment p
Page 119 and 120: 6.3. Clustering and integrating new
Page 121 and 122: 6.4. A large thesaurus for Portugue
Page 129: 6.5. Discussion 111 Another contrib
Page 132 and 133: 114 Chapter 7. Moving from term-bas
Page 146 and 147:
128 Chapter 7. Moving from term-bas
Page 149 and 150:
Chapter 8 Onto.PT: a lexical ontolo
Page 151 and 152:
8.1. Overview 133 items inside a sy
Page 153 and 154:
8.2. Access and Availability 135 no
Page 155 and 156:
8.2. Access and Availability 137 Ex
Page 157 and 158:
8.3. Evaluation 139 Figure 8.3: Ins
Page 159 and 160:
8.3. Evaluation 141 the most reliab
Page 161 and 162:
8.3. Evaluation 143 imation of the
Page 163 and 164:
8.3. Evaluation 145 Relation parteD
Page 165 and 166:
8.4. Using Onto.PT 147 • S: (n) a
Page 167 and 168:
8.4. Using Onto.PT 149 todos os fun
Page 169 and 170:
8.4. Using Onto.PT 151 In addition
Page 171 and 172:
8.4. Using Onto.PT 153 based approa
Page 173:
8.4. Using Onto.PT 155 Uma populaç
Page 176 and 177:
158 Chapter 9. Final discussion 3.
Page 178 and 179:
160 Chapter 9. Final discussion - G
Page 180 and 181:
162 Chapter 9. Final discussion Any
Page 183 and 184:
References Agichtein, E. and Gravan
Page 185 and 186:
References 167 for storing and quer
Page 187 and 188:
References 169 15th International C
Page 189 and 190:
References 171 Symposium (STAIRS 20
Page 191 and 192:
References 173 Hovy, E., Hermjakob,
Page 193 and 194:
References 175 ACM, 38(11):39-41. M
Page 195 and 196:
References 177 ACL Press. Partee, B
Page 197 and 198:
References 179 Russell, S. and Norv
Page 199 and 200:
References 181 Proceedings of 13th
Page 201 and 202:
Appendix A Description of the extra
Page 203 and 204:
• x propriedadeDeAlgoQueCausa y -
Page 205:
• x antonimoAdjDe y Property - x
Page 208 and 209:
190 Appendix B. Coverage of EuroWor
Page 210 and 211:
Page 212:
show all

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?