- Text
- Examples,
- Disambiguation,
- Annotated,
- Lexical,
- Task,
- Evaluation,
- Automatically,
- Supervised,
- Manually,
- Methods,
- Extraction

Automatic Extraction of Examples for Word Sense Disambiguation

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 25 has in **Word**Net (Fellbaum, 1998). Refer to Section 6.4.1 **for** further in**for**mation on the final FVs that we constructed **for** our system. The test set is indeed very similar to the training set. However, since we need to evaluate the system, no class labels are included in the feature vectors. System per**for**mance is normally believed to increase with the amount **of** training data and thus usually the training set is a relatively larger portion **of** the whole data than the test set. It is possible to divide the data in 5 portions where 4 portions will be the training set and 1 portion the test set resulting in a ratio 4:1. Other **of**ten used ratios are 2:1 and 9:1. According to Palmer et al. (2007) a division **of** 2:1 may provide a more realistic indication **of** a system’s per**for**mance, since a larger test set is considered. Still, we know that labeled data is not plenty, which is why it is preferably taken **for** training and not **for** testing. By dividing the data in any particular split however, a bias is unquestionably involved. Consequently a better generalization accuracy measurement has to be used in real experiments - n-fold cross-validation or in particular 10-fold cross-validation (Weiss and Kulkowski, 1991). In n-fold cross-validation the data is divided into n number **of** folds **for** which it is desirable that they are **of** equal size. Accordingly n separate experiments are per**for**med, and in each experiment (also called fold) n-1 portions **of** the data is used **for** training and 1 **for** testing, in such a way that each portion is used as a test item exactly once. If n equals the sample size (the size **of** the data set) the process is called leave-one-out cross-validation. 2.3.5 Supervised WSD Algorithms One **of** the main decisions which needs to be met when designing a supervised WSD system is the choice **of** the algorithm that is to be employed. In Table 2.5 on page 26 is a basic overview **of** the most **of**ten used alternatives as well as some literature where more in**for**mation can be found about them. A short description **of** the algorithms is provided as well in order to give an outline **of** their usage and importance.

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 26 Methods Algorithms Literature Probabilistic Naïve Bayes Maximum Entropy (Duda et al., 2001) (Berger et al., 1996) Vector Space Model (Yarowsky et al., 2001) Similarity-Based k-Nearest Neighbor (Ng and Lee, 1996; Ng, 1997a) (Daelemans et al., 1999) Discriminating Rules Decision Lists Decision Trees (Yarowsky, 1995; Martínez et al., 2002) (Mooney, 1996) Rule Combination AdaBoost LazyBoosting (Schapire, 2003) (Escudero et al., 2000a,b, 2001) Perceptron (Mooney, 1996) Winnow (Escudero et al., 2000b) Linear Classifier Exponentiated-Gradient (Bartlett et al., 2004) Widrow-H**of**f (Abdi et al., 1996) Sleeping Experts (Cohen and Singer, 1999) (Murata et al., 2001) Support Vector Machines (Boser et al., 1992; Lee and Ng, 2002) Kernel-Based Kernel Principal Component Analysis (Cristianini and Shawe-Taylor, 2000) (Carpuat et al., 2004; Wu et al., 2004) Regularized Least Squares (Popescu, 2004) Average Multiclass Perceptron (Ciaramita and Johnson, 2004) Discourse Properties Yarowsky Bootstrapping (Yarowsky, 1995) Table 2.5: Supervised word sense disambiguation algorithms. Probabilistic methods categorize each **of** the new examples by using calculated probabilistic parameters. The latter convey the probability distributions **of** the categories and the contexts that are being described by the features in the feature vectors. Naïve Bayes (Duda et al., 2001) is one **of** the simplest representatives **of** probabilistic methods that presupposes the conditional independence **of** features given the class label. The main idea is that an example is created by selecting the most probable sense **for** the instance and as well **for** each **of** its features independently considering their individual distributions. The algorithm uses the Bayes inversion rule (Fienberg, 2006). It is **of**ten considered that the independence assumption is a problem **for** Naïve Bayes and thus alternative algorithms as the decomposable model by (Bruce and Wiebe, 1994) have been developed. Maximum entropy (Berger et al., 1996) is another quite robust probabilistic approach that combines stochastic evidence from multiple different sources without the need **for** any prior knowledge **of** the data. Discriminating rules assign a sense to an example by selecting one or more predefined rules that are satisfied by the features in the example and hence selecting the sense that the predic- tions **of** those rules yield. **Examples** **for** such methods are Decision lists and Decision trees.

- Page 1 and 2: SEMINAR FÜR SPRACHWISSENSCHAFT Aut
- Page 3 and 4: Abstract In the following thesis we
- Page 5 and 6: Contents 1 Introduction 10 2 Basic
- Page 7 and 8: CONTENTS 6 6.8.4 Discussion . . . .
- Page 9 and 10: LIST OF TABLES 8 6.8 System perform
- Page 11 and 12: Chapter 1 Introduction Ambiguity is
- Page 13 and 14: Chapter 2 Basic Approaches to Word
- Page 15 and 16: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 17 and 18: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 19 and 20: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 21 and 22: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 23 and 24: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 25: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 29 and 30: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 31 and 32: CHAPTER 2. BASIC APPROACHES TO WORD
- Page 33 and 34: Chapter 3 Comparability for WSD Sys
- Page 35 and 36: CHAPTER 3. COMPARABILITY FOR WSD SY
- Page 37 and 38: CHAPTER 4. EVALUATION OF WSD SYSTEM
- Page 39 and 40: CHAPTER 4. EVALUATION OF WSD SYSTEM
- Page 41 and 42: CHAPTER 4. EVALUATION OF WSD SYSTEM
- Page 43 and 44: CHAPTER 4. EVALUATION OF WSD SYSTEM
- Page 45 and 46: Chapter 5 TiMBL: Tilburg Memory-Bas
- Page 47 and 48: CHAPTER 5. TIMBL: TILBURG MEMORY-BA
- Page 49 and 50: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 51 and 52: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 53 and 54: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 55 and 56: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 57 and 58: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 59 and 60: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 61 and 62: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 63 and 64: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 65 and 66: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 67 and 68: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 69 and 70: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 71 and 72: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 73 and 74: CHAPTER 6. AUTOMATIC EXTRACTION OF
- Page 75 and 76: Chapter 7 Conclusion, Future and Re
- Page 77 and 78:
CHAPTER 7. CONCLUSION, FUTURE AND R

- Page 79 and 80:
BIBLIOGRAPHY 78 Baluja, S. (1998),

- Page 81 and 82:
BIBLIOGRAPHY 80 Devijver, P. A. and

- Page 83 and 84:
BIBLIOGRAPHY 82 Kilgarriff, A. (199

- Page 85 and 86:
BIBLIOGRAPHY 84 Mihalcea, R., T. Ch

- Page 87 and 88:
BIBLIOGRAPHY 86 Preiss, J. (2006),

- Page 89 and 90:
BIBLIOGRAPHY 88 Villarejo, L., L. M

- Page 91 and 92:
BIBLIOGRAPHY 90 B Pool of Features

- Page 93 and 94:
BIBLIOGRAPHY 92 C Training and Test

- Page 95 and 96:
BIBLIOGRAPHY 94 Figure 7.2: Accurac

- Page 97 and 98:
BIBLIOGRAPHY 96 E Tables Table 7.1:

- Page 99 and 100:
BIBLIOGRAPHY 98 Table 7.3: System p