Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

More documents

Recommendations

Info

38 Chapter 3. Related Work English concepts that do not have a Portuguese equivalent (Santos et al., 2010). This happens, for instance, to the Princeton WordNet concepts of human action or magnitude relation, aligned to a GAP! in MWN.PT. Moreover, the translation approach tends not to cover specific lexicalisations of the target language. This points out a serious problem when translating a target wordnet to a different language. The particular semantics of a word in a language might be significantly different from its translation equivalent in another language (Cruse, 1986). Moreover, as different languages represent different socio-cultural realities, they do not cover exactly the same part of the lexicon and, even where they seem to be common, several concepts are lexicalised differently (Hirst, 2004). An alternative approach is followed for WordNet.Br, where the concepts are created from scratch for Portuguese and only the relations of the translation equivalents are inherited from Princeton WordNet. Although this approach should not result in lexical gaps, in our view, it does not guarantee that, after the selection of the translation equivalents, inconsistent relations are not generated. Not sense aware: PAPEL is the only referred LKB not structured in synsets, and not sense-aware. Since language is ambiguous, in several NLP tasks, not discriminating different senses of the same word is a limitation. Also, even though it is also the resource with more relation instances, if PAPEL were structured in synsets, this number would surely be lower, as some words would be grouped together. Usage restrictions: Not all of these LKBs are freely available for utilisation and integration in other systems or applications. Despite the availability of part of WordNet.PT for online queries 19 , at the moment of writing this thesis, it was not publicly available for download. MWN.PT is also available for queries through two interfaces 20 . It is not free, but a commercial or an academic license can be bought. Only the synset-base of WordNet.Br is freely available, through TeP. The relations are not, but it is possible to query online for its information on verbs 21 . 3.2 Lexical-Semantic Information Extraction Lexical-Semantic Information Extraction (LSIE) is a special kind of IE where, instead of concepts or named entities, relations are held between word senses, typically identified by lexical items. This means that LSIE deals mainly with the acquisition of lexical-semantic relations. Since the 1970’s, before the creation of Princeton WordNet, researchers have been exploiting textual resources and developing techniques towards the automatic extraction of lexical-semantic knowledge, which could be used in the automatic creation of a broad-coverage LKB. It is thus no surprise that electronic dictionaries were the primary resources exploited for LSIE (see Calzolari et al. (1973) and Amsler (1980)). Language dictionaries are repositories that compile words and expressions 19 WordNet.PT can be queried online, through http://www.clul.ul.pt/wn/ (August 2012) 20 The Visuwords interface for MWN.PT is available from http://mwnpt.di.fc.ul.pt/ (August 2012). The MultiWordNet interface is available from http://multiwordnet.fbk.eu/online/ multiwordnet.php (August 2012) 21 See http://caravelas.icmc.usp.br/wordnetbr/index.html (September 2012)
3.2. Lexical-Semantic Information Extraction 39 of a language. They are substantial sources of general lexical knowledge (Briscoe, 1991) and “authorities” of word senses (Kilgarriff, 1997), which are described in textual definitions, written by lexicographers, the experts on the field. Despite several automatic attempts to the creation of a broad-coverage LKB, for English, Princeton WordNet, a manual effort, ended up to be the leading resource of this kind (Sampson, 2000). As discussed in section 3.1.1, the existence of a wordnet in one language has a positive impact in the development of NLP tools for that language. Nevertheless, despite the wide acceptance of WordNet, research on LSIE continues, not only from dictionaries, but especially from corpora and other unstructured resources, whether it is for the enrichment of WordNet (see section 3.3) or for the creation of alternative LKBs, including LKBs in non-English languages. In this section, we start with a brief chronology of LSIE from dictionaries. Then, we present work on LSIE from corpora and IE from other unstructured textual resources. 3.2.1 Information Extraction from Electronic Dictionaries In the beginning During the 1970s, and throughout the 1980s, electronic dictionaries started to be the target of empirical studies (e.g. Calzolari et al. (1973); Amsler (1980); Michiels et al. (1980)), having in mind their exploitation in the automatic construction of a LKB. This kind of knowledge base would ease the access to morphological and semantic information about the defined words (Calzolari et al., 1973), which would then be very useful in the achievement of NLP tasks. These earlier works confirmed that the vocabulary used in dictionaries is limited, which makes them easier to process for obtaining semantic or syntactic relations (Michiels et al., 1980). They concluded that the textual definitions are often structured on a genus and a differentia (Amsler, 1980): • The genus identifies the superordinate concept of the definiendum – the definiendum is an instance or a “type of” the genus, which means there is a hyponymy relation between the former and the latter. • The differentia contains the specific properties for distinguishing the definiendum from other instances of the superordinate concept. Having in mind that this kind of structure is suitable for being exploited in the automatic acquisition of taxonomies, Amsler (1981) proposes a taxonomy for English nouns and verbs. The extracted structures, dubbed tangled hierarchies, were created after the analysis of dictionary definitions and manual disambiguation of the head word of each definition. Amsler (1981) concluded that dictionaries clearly represent two taxonomic relations: is-a (hypernymy) and is-part (part-of). Calzolari (1984) suggests a set of frequent patterns in dictionary definitions, and examines the occurrence of the hyponymy and “restriction” relations. She claims that hyponymy is the most important and evident relation in the lexicon and confirms it can be easily extracted from a dictionary, after identifying the genus and the differentia. Markowitz et al. (1986) identified a set of textual patterns that occur in the beginning of the definitions of a dictionary. Those patterns are used to denote relations
Page 1:
PhD Thesis Doctoral Program in Info
Page 5: Preface About six years ago, almost
Page 9 and 10: Resumo Não há grandes dúvidas qu
Page 11 and 12: Contents Chapter 1: Introduction .
Page 13: 8.2.1 Semantic Web model . . . . .
Page 16 and 17: 6.1 Illustrative synonymy network.
Page 18 and 19: 6.3 Evaluation against intersection
Page 21 and 22: Chapter 1 Introduction A substantia
Page 23 and 24: 1.2. Approach 5 • They are not bu
Page 25 and 26: 1.4. Outline of the thesis 7 which
Page 27 and 28: Chapter 2 Background Knowledge The
Page 29 and 30: 2.1. Lexical Semantics 11 that, in
Page 31 and 32: 2.1. Lexical Semantics 13 Meronymy
Page 33 and 34: 2.2. Lexical Knowledge Formalisms a
Page 39 and 40: 2.3. Information Extraction from Te
Page 41 and 42: 2.3. Information Extraction from Te
Page 43: 2.4. Remarks on this section 25 usi
Page 46 and 47: 28 Chapter 3. Related Work in group
Page 48 and 49: 30 Chapter 3. Related Work ple rela
Page 50 and 51: 32 Chapter 3. Related Work knowledg
Page 52 and 53: 34 Chapter 3. Related Work the ELRA
Page 54 and 55: 36 Chapter 3. Related Work resource
Page 58 and 59: 40 Chapter 3. Related Work of super
Page 60 and 61: 42 Chapter 3. Related Work • part
Page 62 and 63: 44 Chapter 3. Related Work LSIE fro
Page 64 and 65: 46 Chapter 3. Related Work modifier
Page 66 and 67: 48 Chapter 3. Related Work 6. {,}
Page 68 and 69: 50 Chapter 3. Related Work 1. Extra
Page 70 and 71: 52 Chapter 3. Related Work Due to t
Page 72 and 73: 54 Chapter 3. Related Work comparis
Page 74 and 75: 56 Chapter 3. Related Work creation
Page 76 and 77: 58 Chapter 4. Acquisition of Semant
Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
Page 100 and 101: 82 Chapter 5. Synset Discovery the
Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
Page 104 and 105: 86 Chapter 5. Synset Discovery cota
Page 106 and 107:
88 Chapter 5. Synset Discovery θ W
Page 108 and 109:
90 Chapter 5. Synset Discovery Tabl
Page 110 and 111:
92 Chapter 5. Synset Discovery word
Page 113 and 114:
Chapter 6 Thesaurus Enrichment Gene
Page 115 and 116:
6.1. Automatic Assignment of synpai
Page 117 and 118:
6.2. Evaluation of the assignment p
Page 119 and 120:
6.3. Clustering and integrating new
Page 121 and 122:
6.4. A large thesaurus for Portugue
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129:
6.5. Discussion 111 Another contrib
Page 132 and 133:
114 Chapter 7. Moving from term-bas
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 149 and 150:
Chapter 8 Onto.PT: a lexical ontolo
Page 151 and 152:
8.1. Overview 133 items inside a sy
Page 153 and 154:
8.2. Access and Availability 135 no
Page 155 and 156:
8.2. Access and Availability 137 Ex
Page 157 and 158:
8.3. Evaluation 139 Figure 8.3: Ins
Page 159 and 160:
8.3. Evaluation 141 the most reliab
Page 161 and 162:
8.3. Evaluation 143 imation of the
Page 163 and 164:
8.3. Evaluation 145 Relation parteD
Page 165 and 166:
8.4. Using Onto.PT 147 • S: (n) a
Page 167 and 168:
8.4. Using Onto.PT 149 todos os fun
Page 169 and 170:
8.4. Using Onto.PT 151 In addition
Page 171 and 172:
8.4. Using Onto.PT 153 based approa
Page 173:
8.4. Using Onto.PT 155 Uma populaç
Page 176 and 177:
158 Chapter 9. Final discussion 3.
Page 178 and 179:
160 Chapter 9. Final discussion - G
Page 180 and 181:
162 Chapter 9. Final discussion Any
Page 183 and 184:
References Agichtein, E. and Gravan
Page 185 and 186:
References 167 for storing and quer
Page 187 and 188:
References 169 15th International C
Page 189 and 190:
References 171 Symposium (STAIRS 20
Page 191 and 192:
References 173 Hovy, E., Hermjakob,
Page 193 and 194:
References 175 ACM, 38(11):39-41. M
Page 195 and 196:
References 177 ACL Press. Partee, B
Page 197 and 198:
References 179 Russell, S. and Norv
Page 199 and 200:
References 181 Proceedings of 13th
Page 201 and 202:
Appendix A Description of the extra
Page 203 and 204:
• x propriedadeDeAlgoQueCausa y -
Page 205:
• x antonimoAdjDe y Property - x
Page 208 and 209:
190 Appendix B. Coverage of EuroWor
Page 210 and 211:
Page 212:
show all

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?