A Wordnet from the Ground Up

More documents

Recommendations

Info

4.2. Benefits of Handwritten Patterns 107in identifying the right place for a new LU in the existing hypernymy structure. This ishow we will use them in the wordnet expansion algorithm presented in Section 4.5.3.IPIC WebCorp RzCorp allPattern No. Acc. No Acc. No Acc. No Acc.JestInst 60880 11.61 44888 11.97 30063 12.42 121730 10.89NomToNom 10404 13.5 6414 15.43 4465 14.85 20310 15.66mIInne 14611 30.06 5983 32.52 6682 33.16 24437 30.69in 2 patt. — — — 8777 41.05in 3 patt. — — — 620 74.03Table 4.1: The results of hypernymy instance extraction by manually constructed lexico-morphosyntacticpatterns (No. is the number of LU pairs extracted, Acc. – the accuracy [%], in i patt. – LUpairs occurring in the results of at least i patterns)The accuracy increased when we applied the patterns only to the closed list of targetnominal LUs. There was lower accuracy of the acquired LU pairs in the preliminaryexperiments on the practically unlimited set of the target nominal LUs, acquired fromthe joint corpus.The last two rows in Table 4.1 present the results of voting based on the threepatterns applied to the joint corpus. The accuracy doubles in relation to mIInne, whichproduces the best result among all three patterns used alone (in the case of voting whenwe request the confirmation of an LU pair by all three patterns). On the other hand,the number of LU pairs covered by two or three pattern drops sharply in relation to thelist produced by the subsequent patterns. However the number of LU pairs covered bytwo or three patterns drops sharply in relation to the list produced by the subsequentpatterns. The voting experiments showed that the lower-accuracy patterns JestInst andNomToNom can help increase the final accuracy when combined with mIInne.The corpora used seem to be independent of the number of unique LU pairsextracted by all patterns. In all three cases the number of pairs extracted from the jointcorpus is almost the sum of the numbers for the contributing corpora 6 . Still, not everycorpus appeared to be an equally good basis for the application of patterns.It is hard to find a correlation between the frequencies of the extracted LU pairsand their accuracy, especially for JestInst. High frequencies (> 100) are produced bycollocations, and a typical frequency of a pair is 1–3. They are too low for statisticalevaluation. A potential evaluation should take into account the statistical propertiesof the LUs and the pairs. Such a mechanism has been proposed in the literature forextraction based on automatically generated generic patterns. We will discuss it inSection 4.3.6 We tried to make the corpora free of duplicated texts (some duplication seems unavoidable), andthere were – to some extent – various genres, but the results were still surprising.
108 Chapter 4. Extracting Relation InstancesMetaphor is a major source of errors and, even more so, are relations betweenlarger noun phrases, which the patterns assign only to the heads. A typical situation:NomToNom captures NLU2 that includes a relative clause, but only the head is considered.Even a nominal modifier in genitive or an adjectival modifier often makes themeaning of the noun phrase different from the lexical meaning of the head. The conditionsin mIInne do not constrain the case of the nominal LUs, so it is quite commonto erroneously recognize hyponymy for a noun in genitive that is not the head. It isnot easy, however, to identify complex Polish noun phrases in genitive. The error ratewould be cut if we could apply a good chunker or even a shallow parser combinedwith the analysis of the meaning relations between structurally related noun phrases –see, for example, (Jacquemin, 2001).Examples of LU pairs extracted by all three patterns appear in Figure 4.3. Figure 4.4presents examples of LU pairs extracted from the joint corpus by each of the threepatterns.The results of the application of lexico-morphosyntactic patterns are valuable, butthere remains an impression that more could be achieved by following the main line ofthe pattern-based paradigm. We will now shift our attention to approaches to automaticextraction and evaluation of more generic patterns.4.3 Generic Patterns Verified StatisticallyA manual construction of lexico-syntactic patterns is not laborious if we rely more onintuition than on an intensive survey of known hypernymy instances and the context oftheir occurrences in a corpus. Morin and Jacquemin (1999) proposed semi-automateddiscovery of lexico-syntactic patterns. Given a predefined list of hypernymy instances,sentences including these LU pairs are extracted and transformed into “lexico-syntacticexpressions”. Next, common environments that generalise the expression are producedby considering the similarity of the expressions and a generalisation procedure: lexicosyntacticpatterns describing commonalities of expression subgroups are deduced. Thepattern extraction procedure still assumes manual verification of the deduced patterns,and the patterns are next applied without automatic evaluation of their accuracy and thereliability of the extracted pairs. The latter is especially important for the applicationof generic (weakly constraining) patterns to large corpora.Manually constructed patterns are claimed to have good precision but very lowrecall (Hearst, 1998). Recall can be increased by using more generic patterns extractedautomatically from a corpus, with broad coverage but intrinsically low precision.Most of the proposed methods follow the common scheme: given the initial examplesof the target relations, henceforth called seeds, patterns are generated from thecorpus and next used to extract further instances. Methods differ in pattern generation
Page 1 and 2:
A Wordnetfrom the Ground Up
Page 3 and 4:
Work financed by the Polish Ministr
Page 7 and 8:
6 Prefaceheartfelt thanks go to all
Page 9:
8 Chapter 1. Motivation, Goals, Ear
Page 12 and 13:
1.1. Motivation 11[a] special form
Page 14 and 15:
1.1. Motivation 13Affect (Strappara
Page 16 and 17:
1.2. The Goals of the plWordNet Pro
Page 18 and 19:
1.2. The Goals of the plWordNet Pro
Page 20 and 21:
1.3. Early Decisions 19Merge Model:
Page 22:
1.3. Early Decisions 214. On the ot
Page 25 and 26:
24 Chapter 2. Building a Wordnet Co
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
48 Chapter 3. Discovering Semantic
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58: 56 Chapter 3. Discovering Semantic
Page 103 and 104: 102 Chapter 4. Extracting Relation
Page 107: 106 Chapter 4. Extracting Relation
Page 159 and 160:
158 Chapter 4. Extracting Relation
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
166 Chapter 5. Polish WordNet Today
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 186 and 187:
Appendix ATests for Lexico-semantic
Page 188 and 189:
187Test for adjectives (T. IX)1. p1
Page 190 and 191:
189RelatednessTest for nouns (T. XV
Page 192 and 193:
BibliographyAgarwal, Abhaya and Alo
Page 194 and 195:
Bibliography 193on Deep Lexical Acq
Page 196 and 197:
Bibliography 195Derwojedowa, Magdal
Page 198 and 199:
Bibliography 197Grefenstette, Grego
Page 200 and 201:
Bibliography 199Kurc, Roman. (2008)
Page 202 and 203:
Bibliography 201Mohammad, Saif and
Page 204 and 205:
Bibliography 203. (2006) “The pot
Page 206 and 207:
Bibliography 205and Technology 7(1-
Page 208 and 209:
List of Tables2.1 The size of the c
Page 210 and 211:
List of Figures2.1 The LU perspecti
Page 212 and 213:
List of Figures 2114.16 Completely
Page 214 and 215:
Index 213CBC, see Clustering by Com
Page 216 and 217:
Index 215169, 177, 178, 180, 182hyp
Page 218 and 219:
Index 217mutual hypernymy, 24Mutual
Page 220 and 221:
Index 219SUMO, 14Supported Vector M
Page 222:
A language without a wordnet is at
show all

A Wordnet from the Ground Up

Create successful ePaper yourself

Delete template?

Save as template?