A Wordnet from the Ground Up

More documents

Recommendations

Info

3.4. Measures of Semantic Relatedness 71features was also discussed in (Lapata, 2001, Boleda et al., 2004, 2005), but applied inthe semantic classification of adjectives. We have identified three types of constraintsas the potential semantic descriptors of adjectives:ANmod – an occurrence of a particular noun as modified by the given adjective,AAdv – an adverb in close proximity to the given adjective,AA – the co-occurrence with an adjective that agrees on case, number and gender asa potential co-constituent of the same noun phrase.ANmod is symmetrical to the AdjC constraint used for nominal LUs, but this timelexical elements are nouns instead of adjectives. AAdv is very similar to VAdv: lexicalelements are adverbs and we test the presence of an adverb in a distance not greaterthan 2. The implementation of AA, where lexical elements are adjectival LUs, hasbeen based on the scheme of ANmod, but we are looking for an occurrences of anotheradjectival LU which agrees on case, number and gender and which can be a co-modifierof the same nominal LU.The latter feature was advocated by Hatzivassiloglou and McKeown (1993) asexpressing negative semantic information: only unrelated adjectives can sit in the samenoun phrase. Our corpus data (collected from IPIC), however, suggest that it is toostrong a bias. In addition, our AA constraint also accepts coordination of adjectives,and then related adjectives can co-occur in a noun phrase. In the end, we used theAA feature in a positive way, just like the other features. Features of all three types,weighted and filtered by the RWF weight function discussed in Section 3.4.4, wereused in the discovery of contexts of occurrences of particular adjectives.The AA constraint was applied in two different ways:• as part of a joint large matrix together with the two other constraints: differentparts (columns) of row vectors generated by different constraints, but the matrixprocessed as a whole – this usage is encoded ANmod+AAdv+AA in Table 3.13,• two separate matrices were created: one joint for ANmod+AAdv and anotherfor AA only.In the second situation, the semantic relatedness values were calculated separatelyon the basis of both matrices separately processed and next linearly combined (Brodaet al., 2008):MSR Adj (l 1 , l 2 ) =α MSR ANmod+AAdv (l 1 , l 2 ) + β MSR AA (l 1 , l 2 )(3.5)The values of the coefficients were selected experimentally; α = β = 0.5 gave thebest results.
72 Chapter 3. Discovering Semantic RelatednessDuring the experiments performed by (Broda et al., 2008), a linear combination ofseparate matrices, that is, a linear combination of two MSRs, gave better results thanthe joint matrix ANmod+AAdv+AA. However, as the issue of extracting MSRs on thebasis of the combination of separate matrices still requires more in depth research, wedo not present here a repeated experiment of this kind.The results of the manual evaluation of the constraints for nominal LUs, presentedin (Piasecki and Radziszewski, 2009), appear in Table 3.6. For each constraint templateand the appropriate list of lexical elements, the total number of matches in IPIC wascalculated and based on that a sample of matches was randomly drawn. Each match ofthe lexicalised morphosyntactic constraint in the sample was extracted as a triple: thesentence, the described LU and the lexical elements. The positions of both expressionsin the sentence were marked. The task of the evaluator (one of the co-authors) was toanalyse if the relation described by the constraint holds for the given pair in the givensentence. The sample sizes were chosen according to the method described in (Israel,1992), in such a way that the results of the sample evaluation can be ascribed to thewhole set with a 95% confidence level.ConstraintsAdjC NcC NmgC VsbCPrecision [%] 97.39 67.78 92.36 80.36Table 3.6:The accuracy of the lexico-morphosyntactic constraintsAs one could expect, the highest accuracy was achieved for the AdjC constraint,based strongly on agreement. The tagger caused the majority of the errors. In somecases an adjective located between two nouns of the same values of the analysedgrammatical categories was mistakenly associated with the wrong noun. The goodresult of NmgC was in large extent artificially increased by the aforementioned loosedefinition of the genitive nominal modifier assumed in NmgC and its evaluation. Forexample, we did not distinguish genitive arguments of a gerund which modifies thehead from the proper genitive modifiers of the head. Still, it is worth noting thatwe have achieved relatively good results of subject identification using a fairly simpleconstraint mechanism VsbC.As the majority constraints for verbal and adjectival LUs are symmetrical or verysimilar to those for nominal LUs, we expect similar accuracy.3.4.4 Transformation based on rank weightingIn the co-incidence matrix constructed in step 2 (Section 3.4.2, p. 65) as a result ofthe general MSR extraction process, each LU is described by a vector of features that
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22: 1.3. Early Decisions 214. On the ot
Page 25 and 26: 24 Chapter 2. Building a Wordnet Co
Page 49 and 50: 48 Chapter 3. Discovering Semantic
Page 71: 70 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 123 and 124:
122 Chapter 4. Extracting Relation
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?