Automatic Extraction of Examples for Word Sense Disambiguation

More documents

Recommendations

Info

CHAPTER 4. EVALUATION OF WSD SYSTEMS 41 fessional lexicographers but rather by contributors on a voluntary basis. The latter could use a Web-based application which enabled easy and fast annotation of the examples. Since we used exactly this corpus as a grounding base for our data, we consider it very relevant to review the most important parts of the annotation process. For each of the 57 words a set of sentences with their occurrences was extracted from the corpus and presented to the users of the Web-application. For each of the instances, the contributors had the chance to select one or more senses from a predefined list of all senses that the targeted word has. Of course, it was more than preferable that only one sense is chosen, however, the possibility to select more than one sense together with the options unclear and none of the above was provided. Figure 4.1 on page 41 shows an example of the Web-application. Figure 4.1: Screenshot from Open Mind Word Expert (Chklovski and Mihalcea, 2002). As sense inventory a mixture of WordNet (for all the nouns and the adjectives) and Wordsmyth 6 (for the verbs) was used . We already noticed in the previous sections that the MFS classifier is usually used as a baseline for system performance and for the English lexical sample task this heuristic reached 64.5% (coarse) and 55.2% (fine) accuracy. The reported overall IAA (nor- mally used as upper bound for system performance) for the data is relatively low - approximately 6 http://www.wordsmyth.net/
CHAPTER 4. EVALUATION OF WSD SYSTEMS 42 67.3%. From this result we could conclude that the expected performance of the systems in that task was in the range between the MFS and the inter-annotator agreement (IAA). However, the best performing system was reported to have reached an accuracy of 79.3% (coarse) and 72.9% (fine). Those considerably unexpected results could either mean that the IAA achieved by the OMWE is not good enough to be used as an upper bound for the task or that the systems reach considerably better performance than humans do for the given data. By briefly describing the Senseval-3 lexical sample task and in particular the English lexical sample task we aimed to give a broad overview of the attempts and achievements in the en- terprise, hence this is a starting point of understanding the system that we designed, which is fairly similar to the systems we have presented for the task. In Table 4.4 we have summed up the most important information from the presented section. To those figures we will return later on in our work when we will look at the structure and results of our approach. Language IAA Best Score (coarse) System Basque 78.2% 71.1% Swarthmore College Catalan 93.16% 85.2% IRST Chinese - 60.40% I 2 R-WSD (FEATUREA1) English 67.3% 79.3% htsa3 Italian 73% - 99% 53.1% IRST-Kernels Romanian 41.72% - 94.43% 77.1% romanian-swat hk-bo Spanish 83% - 90% 84.20% IRST 4.5.3 Other Tasks Table 4.4: Summary of the Senseval-3 Lexical Sample Task. Along with the all-words task and the lexical sample task there were a few other competitions in Senseval-3 (Automatic subcategorization acquisition (Preiss and Korhonen, 2004), Multilin- gual lexical sample/Translation (Chklovski et al., 2004), Word-Sense Disambiguation of WordNet Glosses (Litkowski, 2004b), Automatic Labeling of Semantic Roles (Litkowski, 2004a), Identifi- cation of Logic Forms in English (Rus, 2004), Identification of Semantic Roles in Swedish - can- celled). However, those tasks are not that relevant for our work and for the sake shortness we will not pay further attention to them. Thus, if any detailed information is needed, refer to the relevant literature for each task. 4.6 Semeval-1 Since the results from the Senseval competitions were so good and there was a growing in- terest in the task of classifying semantic relations between pairs of words a new exercise was
Page 1 and 2: SEMINAR FÜR SPRACHWISSENSCHAFT Aut
Page 3 and 4: Abstract In the following thesis we
Page 5 and 6: Contents 1 Introduction 10 2 Basic
Page 7 and 8: CONTENTS 6 6.8.4 Discussion . . . .
Page 9 and 10: LIST OF TABLES 8 6.8 System perform
Page 11 and 12: Chapter 1 Introduction Ambiguity is
Page 13 and 14: Chapter 2 Basic Approaches to Word
Page 15 and 16: CHAPTER 2. BASIC APPROACHES TO WORD
Page 33 and 34: Chapter 3 Comparability for WSD Sys
Page 35 and 36: CHAPTER 3. COMPARABILITY FOR WSD SY
Page 37 and 38: CHAPTER 4. EVALUATION OF WSD SYSTEM
Page 39 and 40: CHAPTER 4. EVALUATION OF WSD SYSTEM
Page 41: CHAPTER 4. EVALUATION OF WSD SYSTEM
Page 45 and 46: Chapter 5 TiMBL: Tilburg Memory-Bas
Page 47 and 48: CHAPTER 5. TIMBL: TILBURG MEMORY-BA
Page 49 and 50: CHAPTER 6. AUTOMATIC EXTRACTION OF
Page 75 and 76: Chapter 7 Conclusion, Future and Re
Page 77 and 78: CHAPTER 7. CONCLUSION, FUTURE AND R
Page 79 and 80: BIBLIOGRAPHY 78 Baluja, S. (1998),
Page 81 and 82: BIBLIOGRAPHY 80 Devijver, P. A. and
Page 83 and 84: BIBLIOGRAPHY 82 Kilgarriff, A. (199
Page 85 and 86: BIBLIOGRAPHY 84 Mihalcea, R., T. Ch
Page 87 and 88: BIBLIOGRAPHY 86 Preiss, J. (2006),
Page 89 and 90: BIBLIOGRAPHY 88 Villarejo, L., L. M
Page 91 and 92: BIBLIOGRAPHY 90 B Pool of Features
Page 93 and 94:
BIBLIOGRAPHY 92 C Training and Test
Page 95 and 96:
BIBLIOGRAPHY 94 Figure 7.2: Accurac
Page 97 and 98:
BIBLIOGRAPHY 96 E Tables Table 7.1:
Page 99 and 100:
BIBLIOGRAPHY 98 Table 7.3: System p
show all

Automatic Extraction of Examples for Word Sense Disambiguation

Create successful ePaper yourself

Delete template?

Save as template?