Automatic Extraction of Examples for Word Sense Disambiguation

More documents

Recommendations

Info

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 61 course, this is good especially for fully supervised approaches, since then the training data is considerably less and exceptional cases can be represented only with a few instances. A negative feature, however, is the fact that each instance is taken into account with all its features present in the FV. Since memory-based learners are extremely sensitive to irrelevant features this leads to a significant decrease in accuracy whenever such features are present. For this purpose we use a forward-backward algorithm similar to (Dinu and Kübler, 2007) or (Mihalcea, 2002) which improves the performance of a word-expert by selecting a subset of features from the provided FV for the targeted word. This subset has less but more relevant for the disambiguation process features. The exact forward (9) and backward (10) algorithms that we used, following Dinu and Kübler (2007) are as follows: (9) function ForwardAutomaticFeatureSelection 1: generate a pool of features PF = {Fi} 2: initialize the set of selected features with the empty set SF = {Ø} 3: extract training and testing corpora for the given target ambiguous word 4: repeat 5: for each feature Fi ∈ PF do 6: run a disambiguation experiment on the training set; each example in the training set contains the features in SF and the feature Fi 7: determine the feature Fi leading to the best accuracy 8: remove Fi from PF and add it to SF 9: end for 10: until no improvements are obtained (10) function BackwardAutomaticFeatureSelection 1: generate a pool of features PF = {Fi} 2: extract training and testing corpora for the given target ambiguous word 3: repeat 4: for each PF ∩ Fi do 5: run a disambiguation experiment on the training set 6: determine the feature Fi leading to the worst accuracy 7: remove Fi from PF 8: end for 9: until improvements are obtained or the accuracy remains stable
CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 62 6.7 Scoring Having trained a classifier and tested it on the provided examples, our system returns answers for each of the instances in the test set. Those answers, however, need to be checked and for this purpose Senseval-3 provides a special scoring software (scorer2 13 ), developed in C. We scored our system for precision and recall against two of the three provided scoring schemes - fine-grained (the answers of the system need to match exactly according to the correct answers from the scorer) and coarse-grained (the answers are compared to the coarse-grained senses) (see Section 4.1). The scoring software expects a specific format for the answers, which does not correspond to the output file from the memory-based learner. More precisely the scorer expects that each line contains the id for the lexical item (this is the value of the attribute item in the tag), the id for the instance (which is the value of the id attribute in the tag), all possible answers of the system (in the same vector) and if needed a comment. All separate parts of the answer vector need to be separated by spaces. Scorer2 ignores all answers that have already been scored (in case more than one answer vector is provided for an instance). A very interesting feature of the software is the fact that it allows the use of weights (probabilities) for the answers. Again for clarification, let us consider our toy example and how exactly the correct answer for it should look like. First we need the reference id for the lexical item, which in our case is activate.v. Second, the reference id for the instance - activate.v.cald.12345678 and then the answer - 38201, which all together looks like: (11) activate.v activate.v.cald.12345678 38201 6.8 Experimental Results In order to get a good overview of the strengths and disadvantages of our system as well as a good assessment of the automatically added data to it we conducted several different experiments. We started with training word-experts only on the manually annotated instances (see Section 6.8.1), which shows how good the system is designed and gives us a basis to compare it to other already existing ones that were reported in the Senseval-3 competition. However, since our main aim is to examine the instances that we automatically gathered, we trained word-experts only on them as well (Section 6.8.2), or used them to extend the already existing Senseval-3 corpus (Section 6.8.3). 13 http://www.cse.unt.edu/∼rada/senseval/senseval3/scoring/scoring.tar.gz
Page 1 and 2:
SEMINAR FÜR SPRACHWISSENSCHAFT Aut
Page 3 and 4:
Abstract In the following thesis we
Page 5 and 6:
Contents 1 Introduction 10 2 Basic
Page 7 and 8:
CONTENTS 6 6.8.4 Discussion . . . .
Page 9 and 10:
LIST OF TABLES 8 6.8 System perform
Page 11 and 12: Chapter 1 Introduction Ambiguity is
Page 13 and 14: Chapter 2 Basic Approaches to Word
Page 15 and 16: CHAPTER 2. BASIC APPROACHES TO WORD
Page 33 and 34: Chapter 3 Comparability for WSD Sys
Page 35 and 36: CHAPTER 3. COMPARABILITY FOR WSD SY
Page 37 and 38: CHAPTER 4. EVALUATION OF WSD SYSTEM
Page 45 and 46: Chapter 5 TiMBL: Tilburg Memory-Bas
Page 47 and 48: CHAPTER 5. TIMBL: TILBURG MEMORY-BA
Page 49 and 50: CHAPTER 6. AUTOMATIC EXTRACTION OF
Page 61: CHAPTER 6. AUTOMATIC EXTRACTION OF
Page 75 and 76: Chapter 7 Conclusion, Future and Re
Page 77 and 78: CHAPTER 7. CONCLUSION, FUTURE AND R
Page 79 and 80: BIBLIOGRAPHY 78 Baluja, S. (1998),
Page 81 and 82: BIBLIOGRAPHY 80 Devijver, P. A. and
Page 83 and 84: BIBLIOGRAPHY 82 Kilgarriff, A. (199
Page 85 and 86: BIBLIOGRAPHY 84 Mihalcea, R., T. Ch
Page 87 and 88: BIBLIOGRAPHY 86 Preiss, J. (2006),
Page 89 and 90: BIBLIOGRAPHY 88 Villarejo, L., L. M
Page 91 and 92: BIBLIOGRAPHY 90 B Pool of Features
Page 93 and 94: BIBLIOGRAPHY 92 C Training and Test
Page 95 and 96: BIBLIOGRAPHY 94 Figure 7.2: Accurac
Page 97 and 98: BIBLIOGRAPHY 96 E Tables Table 7.1:
Page 99 and 100: BIBLIOGRAPHY 98 Table 7.3: System p
show all

Automatic Extraction of Examples for Word Sense Disambiguation

Create successful ePaper yourself

Delete template?

Save as template?