Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

More documents

Recommendations

Info

48 Chapter 3. Related Work 6. {,} especially { ,}* {and | or} ... most European countries, especially France, England, and Spain. ⇒ {France hyponym of European country}, {England hyponym of European country}, {Spain hyponym of European country} Inspired by the work of Hearst (1992), Freitas (2007) discusses the extraction of hypernymy relations from Portuguese corpora. In her work, some Hearst patterns were adapted to Portuguese, which resulted in the following: • {tais} como {, ... , (e | ou) } A tentativa posterior de clonar outros mamíferos tais como camundongos, porcos, bezerros,.... ⇒ {camundongos hyponym of mamíferos}, {porcos hyponym of mamíferos}, {bezerros hyponym of mamíferos} • {, }* {,} (e | ou) outros ... a experiência subjetiva com o LSD-25 e outros alucinógenos. ⇒ {LSD-25 hyponym of alucinógeno} • tipos de : { , ... ,} (e | ou) Existem dois tipos de cromossomos gigantes: cromossomos politênicos e cromossomos plumulados. ⇒ {cromossomos politênicos hyponym of cromossomos}, {cromossomos plumulados hyponym of cromossomos} • chamad(o|os|a|as) {de} ... a alta frequência da doença mental chamada esquizofrenia. ⇒ {esquizofrenia hyponym of doença mental} Also for the extraction of hypernyms, Caraballo (1999) proposed a combination of pattern detection and a clustering method where noun candidates are obtained from a corpus using data on conjunctions and appositives. A co-occurrence matrix for all nouns is used. It contains a vector for each noun in the corpus, with the number of times it co-occurs, in a conjunction or appositive, with each other noun. If v and w are the vectors of two nouns, similarity between them is calculated as below, which can be see as a variant of LSA (cosine similarity): cos(v, w) = v. w |v|.| w| (3.3) In a post-processing step, Hearst-like patterns are used for finding hypernym candidates, which, if appropriate, are placed as common parent nodes for clusters. Cederberg and Widdows (2003) used a similar variant of LSA to improve the precision and recall of hyponymy relations, extracted from a corpus using Hearstlike patterns. Having in mind that a hyponym and its hypernym are expected to be similar, LSA is used to compute the similarity of terms in the extracted relations. While the precision of a random sample of extracted relations was 40%, the precision of the 100 relations with higher similarity was 58%, which suggests the effectiveness of this method for reducing errors. Furthermore, as most of the potential hyponymy relations that could be extracted are not expressed by the six Hearst patterns, Cederberg and Widdows (2003) improved the recall of their method using coordination as a cue for similarity. They
3.2. Lexical-Semantic Information Extraction 49 give the following sentences, taken from the British National Corpus 23 (BNC), to illustrate their assumptions: 1. This is not the case with sugar, honey, grape must, cloves and other spices which increase its merit. ⇒ {clove hyponym of spice} 2. Ships laden with nutmeg or cinnamon, cloves or coriander once battled the Seven Seas to bring home their precious cargo. ⇒ {nutmeg hyponym of spice} ⇒ {cinnamon hyponym of spice} ⇒ {coriander hyponym of spice} Using the correct relations extracted without the LSA filter, for each hyponym, the top ten most similar words were collected and tested for having the same hypernym. This resulted in a slight improvement of precision, while the number of relations obtained was ten times higher. Berland and Charniak (1999) present work on the extraction of part-of relations from a corpus, using handcrafted patterns. In a similar fashion to Hearst (1992), seed instances are used to infer linguistic patterns, then used to acquire new relation instances. In the end, the extracted instances are ranked according to their loglikelihood (Dunning, 1993). Girju and Moldovan (2002) followed Hearst’s method to discover lexical-syntactic patterns expressing causation. Given that only some categories of nouns (e.g. states of affairs) can be associated with causation, extracted relations were later validated regarding semantic constraints on the relation arguments. Cimiano and Wenderoth (2007) present an approach for the automatic acquisition of qualia structures (Pustejovsky, 1991), which aim to describe the meaning of lexical elements (earlier presented in section 2.2.5 of this thesis). Willing to decrease the problem of data sparseness, they propose looking for discriminating patterns in the Web. For each qualia term, a set of search engine queries for each qualia role is generated, based on known lexical-syntactic patterns. The first 50 snippets returned are downloaded and POS-tagged. Then, patterns, defined over POS-tags, conveying the qualia role of interest, are matched to obtain candidate qualia elements. In the end, the candidates are weighted and ranked according to well-known similarity measures (e.g. Jaccard coefficient, PMI). The main problem of the aforementioned approaches is that they rely on a finite set of handcrafted rules, though some discovered with the help of automatic procedures, and are therefore vulnerable to data sparseness. Even though Hearst (1992) says that the six proposed patterns occur frequently, they are unlikely to capture all the occurrences of the target relation(s). About the manual identification of semantic patterns, Snow et al. (2005) add that it is not very interesting and can be biased by the designer. They propose a supervised approach, trained with WordNet, to discover hyponymy patterns, and an automatic classifier that decides if a hypernymy relation holds between two nouns. Their procedure works as follows: 23 See http://www.natcorp.ox.ac.uk/ (August 2012)
Page 1:
PhD Thesis Doctoral Program in Info
Page 5:
Preface About six years ago, almost
Page 9 and 10:
Resumo Não há grandes dúvidas qu
Page 11 and 12:
Contents Chapter 1: Introduction .
Page 13:
8.2.1 Semantic Web model . . . . .
Page 16 and 17: 6.1 Illustrative synonymy network.
Page 18 and 19: 6.3 Evaluation against intersection
Page 21 and 22: Chapter 1 Introduction A substantia
Page 23 and 24: 1.2. Approach 5 • They are not bu
Page 25 and 26: 1.4. Outline of the thesis 7 which
Page 27 and 28: Chapter 2 Background Knowledge The
Page 29 and 30: 2.1. Lexical Semantics 11 that, in
Page 31 and 32: 2.1. Lexical Semantics 13 Meronymy
Page 33 and 34: 2.2. Lexical Knowledge Formalisms a
Page 39 and 40: 2.3. Information Extraction from Te
Page 41 and 42: 2.3. Information Extraction from Te
Page 43: 2.4. Remarks on this section 25 usi
Page 46 and 47: 28 Chapter 3. Related Work in group
Page 48 and 49: 30 Chapter 3. Related Work ple rela
Page 50 and 51: 32 Chapter 3. Related Work knowledg
Page 52 and 53: 34 Chapter 3. Related Work the ELRA
Page 54 and 55: 36 Chapter 3. Related Work resource
Page 56 and 57: 38 Chapter 3. Related Work English
Page 58 and 59: 40 Chapter 3. Related Work of super
Page 60 and 61: 42 Chapter 3. Related Work • part
Page 62 and 63: 44 Chapter 3. Related Work LSIE fro
Page 64 and 65: 46 Chapter 3. Related Work modifier
Page 68 and 69: 50 Chapter 3. Related Work 1. Extra
Page 70 and 71: 52 Chapter 3. Related Work Due to t
Page 72 and 73: 54 Chapter 3. Related Work comparis
Page 74 and 75: 56 Chapter 3. Related Work creation
Page 76 and 77: 58 Chapter 4. Acquisition of Semant
Page 98 and 99: 80 Chapter 5. Synset Discovery Ther
Page 100 and 101: 82 Chapter 5. Synset Discovery the
Page 102 and 103: 84 Chapter 5. Synset Discovery tb-t
Page 104 and 105: 86 Chapter 5. Synset Discovery cota
Page 106 and 107: 88 Chapter 5. Synset Discovery θ W
Page 108 and 109: 90 Chapter 5. Synset Discovery Tabl
Page 110 and 111: 92 Chapter 5. Synset Discovery word
Page 113 and 114: Chapter 6 Thesaurus Enrichment Gene
Page 115 and 116: 6.1. Automatic Assignment of synpai
Page 117 and 118:
6.2. Evaluation of the assignment p
Page 119 and 120:
6.3. Clustering and integrating new
Page 121 and 122:
6.4. A large thesaurus for Portugue
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129:
6.5. Discussion 111 Another contrib
Page 132 and 133:
114 Chapter 7. Moving from term-bas
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 149 and 150:
Chapter 8 Onto.PT: a lexical ontolo
Page 151 and 152:
8.1. Overview 133 items inside a sy
Page 153 and 154:
8.2. Access and Availability 135 no
Page 155 and 156:
8.2. Access and Availability 137 Ex
Page 157 and 158:
8.3. Evaluation 139 Figure 8.3: Ins
Page 159 and 160:
8.3. Evaluation 141 the most reliab
Page 161 and 162:
8.3. Evaluation 143 imation of the
Page 163 and 164:
8.3. Evaluation 145 Relation parteD
Page 165 and 166:
8.4. Using Onto.PT 147 • S: (n) a
Page 167 and 168:
8.4. Using Onto.PT 149 todos os fun
Page 169 and 170:
8.4. Using Onto.PT 151 In addition
Page 171 and 172:
8.4. Using Onto.PT 153 based approa
Page 173:
8.4. Using Onto.PT 155 Uma populaç
Page 176 and 177:
158 Chapter 9. Final discussion 3.
Page 178 and 179:
160 Chapter 9. Final discussion - G
Page 180 and 181:
162 Chapter 9. Final discussion Any
Page 183 and 184:
References Agichtein, E. and Gravan
Page 185 and 186:
References 167 for storing and quer
Page 187 and 188:
References 169 15th International C
Page 189 and 190:
References 171 Symposium (STAIRS 20
Page 191 and 192:
References 173 Hovy, E., Hermjakob,
Page 193 and 194:
References 175 ACM, 38(11):39-41. M
Page 195 and 196:
References 177 ACL Press. Partee, B
Page 197 and 198:
References 179 Russell, S. and Norv
Page 199 and 200:
References 181 Proceedings of 13th
Page 201 and 202:
Appendix A Description of the extra
Page 203 and 204:
• x propriedadeDeAlgoQueCausa y -
Page 205:
• x antonimoAdjDe y Property - x
Page 208 and 209:
190 Appendix B. Coverage of EuroWor
Page 210 and 211:
Page 212:
show all

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?