A Wordnet from the Ground Up

More documents

Recommendations

Info

4.1. Lexico-Morphosyntactic Patterns 103language (Section 3.4.3) , very similarly to how we wrote lexico-morphosyntactic constraintsfor context description in MSR extraction. Preprocessing by the morphosyntactictagger TaKIPI was assumed.In the end, we found five different patterns for extracting hypernymy instances. Weonly adopted for further stages those patterns that pick out more than a few LU pairs inHC 1 . In the following schematic description, we show (i) the target nominal LUs NLU1and NLU2 with their positions and constraints on the grammatical category values, (ii)the trigger words (base is the root, wf – the word form), (iii) constraints on LUs thatoccur in between 2 ‘. . . ’ denotes any sequence of tokens.JestInst: NLU1(cas=nom) ...(base=być(to be)) ...NLU2(cas=inst, nmb=nmb(NLU1))— NLU1 is supposed to be a hyponym and NLU2 a hypernym; there also is aconfiguration with the reversed positions of both target LUs and roles signalledby case values (Figure 4.1);NomToNom: NLU1(cas=nom) (Adj|Adv|Noun|Num)* (base=to)(≈ copulative is) (Adj|Adv|Noun|Num)* NLU2(cas=nom)— in theory, this spots synonym pairs, but in practice it often pick out NLU2/NLU1hypernymy (all other nouns must be in the genitive case);IInne: NLU1 (Adj|Adv|Noun|,)* (base∈{i, oraz}(and))(base∈{inny, pozostały}(other, remaining), nmb=pl)(Adj|Adv)* NLU2(cas=cas(NLU1))— similar to Hearst’s well-known pattern, this also finds NLU2/NLU1 hypernymy;TakichJak: NLU1(cas≠gen)(Adj|Adv|PartAdj|PartAdv|Noun|Num|Pron|Conj|Punct)*(base=taki (such)) (base=jak (as))(Adj|Adv|PartAdj|PartAdv|Noun|Num|Conj|Punct)* NLU2(cas=nom)— structurally similar to IInne: NLU2 is one of the hyponyms related to thehypernym NLU1; for details of the part between NLU1 and taki see Figure 4.2;WTym: NLU1(cas≠gen) (Adj|Adv|Num){0,2} (base=,) (base=w) (wf=tym)(Adj|Adv|Num){0,2} NLU2(cas=cas(NLU1))— another version of IInne:NLU1 is a hypernym and NLU2 is one of the hyponyms.In view of the the overall goal of semi-automatic expansion of plWordNet, wedid not apply patterns to corpus completely freely. Instead, we have specified the1 Some rejected patterns may be more prolific in a larger corpus, but we focused on the sensitivity toplWordNet’s understanding of hypernymy.2 Pattern names contain the Polish trigger words.
104 Chapter 4. Extracting Relation Instancestarget nominal LUs not only with some constraints by also with the pairs of LUs infocus. That is why we used here the same set of 13285 nominal lemmas which wereselected for the construction of nominal MSR and the expansion of the core plWordNet(Section 3.4.5). From the set, we generated all possible pairs. We also reapplied themechanism of co-incidence matrix construction. Target LUs were assigned to rows,and patterns with the position NLU2 instantiated to subsequent target LUs were assignedto columns. The patterns were run with position 0 representing NLU1 3 .Given these assumptions, there is no need to test the presence of NLU1 in theIInne and TakichJak code (Figures 4.1–4.2). We refer the reader to Section 3.4.3for the details of JOSKIPI. IInne is implemented in two symmetrical parts joined byor for two configurations of the hyponym (NLU1) and hypernym (NLU2). The matrixconstruction requires that we start with the hyponym in position 0. In the first part, wefirst test if the potential NLU1 is nominative, then look to the right (till the end of thesentence) for the first verb word form or the first nominal LU and record its positionin variable $X. We test if it is a form of the verb być (to be) – any other verb or nounmeans that the sentence does not match the pattern. We look further to the right for thefirst verb or the first nominal LU, or a preposition (prep) that requires the instrumentalcase. The latter is necessary, because NLU2 in the pattern is only identified by the casevalue induced by the verb być. The token at position $Y is compared with the baseform with which the pattern was instantiated 4 . We also test its case and number.In the pattern TakichJak in Figure 4.2, the iteration goes in the opposite direction.Hyponyms now follow the hypernym, and we wanted to keep the same 〈hyponym,hypernym〉 order of the extracted LU pairs across all the patterns. After the caseof NLU1 has been tested, we look to the left till the beginning of the sentence forthe sequence taki jak (such as). Next, we test the tokens between 0 and $+2T –the position after jak – for the presence of only LUs of the specified grammaticalclasses plus the specified punctuation marks and conjunctions; this signal a coordinatesequence of noun phrases. Finally, NLU2 is sought further to the left, and tokensbetween it and taki are tested. Only modifiers are accepted there, including nounsand pronouns in the genitive case.The implementation of the other three patterns is similar.The patterns IInne, WTym and TakichJak are structurally very similar: a hypernymand a list of hyponyms. Also, a preliminary evaluation on a part of IPIC showed3 Multiword LUs were recognised during preprocessing and folded into a one-token representationwith the attribute and root set to the values proper for the whole LU. During matrix construction, eachtarget LU occupies exactly one token in the preprocessed representation of the corpus (Broda and Piasecki,2008b). Recognition of multiword LUs was limited to target LUs (all parts of speech) due to the labourintensityof their syntactic description.4 Technically, each column in the matrix is assigned its own copy of the pattern instantiated to theappropriate nominal LU as NLU2.
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
50 Chapter 3. Discovering Semantic
Page 53 and 54: 52 Chapter 3. Discovering Semantic
Page 103: 102 Chapter 4. Extracting Relation
Page 107 and 108: 106 Chapter 4. Extracting Relation
Page 155 and 156:
154 Chapter 4. Extracting Relation
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?