Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

More documents

Recommendations

Info

Chapter 7 Moving from term-based to synset-based relations Typical information extraction (IE) systems are capable of acquiring concept instances and information about these concepts from large collections of text. Whether these systems aim for the automatic acquisition of lexical-semantic relations (e.g. Chodorow et al. (1985); Hearst (1992); Pantel and Pennacchiotti (2006)), of knowledge on specific domains (e.g. Pustejovsky et al. (2002); Wiegand et al. (2012)), or the extraction of open-domain facts (e.g. Agichtein and Gravano (2000); Banko et al. (2007); Etzioni et al. (2011)) they typically represent concepts as terms, which are lexical items identified by their lemma. This is also how CARTÃO is structured. There, semantic relations are denoted by relational triples t = {a R b}, where the arguments (a and b) are terms whose meaning is connected by a relation described by R. As we have done throughout this thesis, we refer to the previous representation as term-based triples (tb-triples). The problem is that a simple term is usually not enough to unambiguously refer to a concept, because the same word might have different meanings and different words might have the same meaning. On the one hand, this problem is not severe in the extraction of domain knowledge, where, based on the “one sense per discourse” assumption (Gale et al., 1992), ambiguity is low. On the other hand, when dealing with broad-coverage knowledge, if ambiguities are not handled, it becomes impractical to formalise the extracted information and to accomplish tasks such as inference for discovering new knowledge. Therefore, to make IE systems more useful, a new step, which can be seen as a kind of WSD, is needed. Originally baptised as ontologising (Pantel, 2005), this step aims at moving from knowledge structured in terms, identified by their orthographical form, towards an ontological structure, organised in concepts, which is done by associating the terms to a representation of their meaning. After the steps presented in the previous chapters, we are left with a lexical network, CARTÃO, with tb-triples extracted from text (chapter 4), and with a thesaurus, with synsets (chapter 5 and 6). While the synsets can be seen as concepts and their possible lexicalisations, the identification of the correct sense(s) of the arguments of a tb-triple for which the relation is valid is not straightforward. However, whereas most WSD techniques rely on the context where the words to be disambiguated occur to find their most adequate sense, the tb-triples do not provide their extraction context. While we could recover the context for some of
Page 1:
PhD Thesis Doctoral Program in Info
Page 5:
Preface About six years ago, almost
Page 9 and 10:
Resumo Não há grandes dúvidas qu
Page 11 and 12:
Contents Chapter 1: Introduction .
Page 13:
8.2.1 Semantic Web model . . . . .
Page 16 and 17:
6.1 Illustrative synonymy network.
Page 18 and 19:
6.3 Evaluation against intersection
Page 21 and 22:
Chapter 1 Introduction A substantia
Page 23 and 24:
1.2. Approach 5 • They are not bu
Page 25 and 26:
1.4. Outline of the thesis 7 which
Page 27 and 28:
Chapter 2 Background Knowledge The
Page 29 and 30:
2.1. Lexical Semantics 11 that, in
Page 31 and 32:
2.1. Lexical Semantics 13 Meronymy
Page 33 and 34:
2.2. Lexical Knowledge Formalisms a
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
2.3. Information Extraction from Te
Page 41 and 42:
2.3. Information Extraction from Te
Page 43:
2.4. Remarks on this section 25 usi
Page 46 and 47:
28 Chapter 3. Related Work in group
Page 48 and 49:
30 Chapter 3. Related Work ple rela
Page 50 and 51:
32 Chapter 3. Related Work knowledg
Page 52 and 53:
34 Chapter 3. Related Work the ELRA
Page 54 and 55:
36 Chapter 3. Related Work resource
Page 56 and 57:
38 Chapter 3. Related Work English
Page 58 and 59:
40 Chapter 3. Related Work of super
Page 60 and 61:
42 Chapter 3. Related Work • part
Page 62 and 63:
44 Chapter 3. Related Work LSIE fro
Page 64 and 65:
46 Chapter 3. Related Work modifier
Page 66 and 67:
48 Chapter 3. Related Work 6. {,}
Page 68 and 69:
50 Chapter 3. Related Work 1. Extra
Page 70 and 71:
52 Chapter 3. Related Work Due to t
Page 72 and 73:
54 Chapter 3. Related Work comparis
Page 74 and 75:
56 Chapter 3. Related Work creation
Page 76 and 77:
58 Chapter 4. Acquisition of Semant
Page 78 and 79:
60 Chapter 4. Acquisition of Semant
Page 80 and 81: 62 Chapter 4. Acquisition of Semant
Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
Page 100 and 101: 82 Chapter 5. Synset Discovery the
Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
Page 104 and 105: 86 Chapter 5. Synset Discovery cota
Page 106 and 107: 88 Chapter 5. Synset Discovery θ W
Page 108 and 109: 90 Chapter 5. Synset Discovery Tabl
Page 110 and 111: 92 Chapter 5. Synset Discovery word
Page 113 and 114: Chapter 6 Thesaurus Enrichment Gene
Page 115 and 116: 6.1. Automatic Assignment of synpai
Page 117 and 118: 6.2. Evaluation of the assignment p
Page 119 and 120: 6.3. Clustering and integrating new
Page 121 and 122: 6.4. A large thesaurus for Portugue
Page 129: 6.5. Discussion 111 Another contrib
Page 133 and 134: 7.1. Ontologising algorithms 115 Ea
Page 135 and 136: 7.1. Ontologising algorithms 117 Ad
Page 137 and 138: 7.2. Ontologising performance 119 F
Page 139 and 140: 7.2. Ontologising performance 121
Page 141 and 142: 7.2. Ontologising performance 123 T
Page 143 and 144: 7.2. Ontologising performance 125 A
Page 145 and 146: 7.2. Ontologising performance 127 %
Page 147: 7.3. Discussion 129 • The gold re
Page 150 and 151: 132 Chapter 8. Onto.PT: a lexical o
Page 175 and 176: Chapter 9 Final discussion The rese
Page 177 and 178: 9.1. Contributions 159 - Anton Pér
Page 179 and 180: 9.2. Future work 161 more than cues
Page 181:
9.3. Concluding remarks 163 reform
Page 184 and 185:
166 References Banko, M., Cafarella
Page 186 and 187:
168 References Clark, P., Fellbaum,
Page 188 and 189:
170 References Gale, W. A., Church,
Page 190 and 191:
172 References EACL 2012, pages 580
Page 192 and 193:
174 References Levin, B. (1993). En
Page 194 and 195:
176 References Navigli, R. (2009a).
Page 196 and 197:
178 References Language Resource an
Page 198 and 199:
180 References Shi, L. and Mihalcea
Page 200 and 201:
182 References volume 85 of CRPIT,
Page 202 and 203:
184 Appendix A. Description of the
Page 204 and 205:
186 Appendix A. Description of the
Page 207 and 208:
Appendix B Coverage of EuroWordNet
Page 209 and 210:
Table B.1 - continued from previous
Page 211 and 212:
Table B.2 - continued from previous
show all

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?