4 years ago

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation


CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 27 Decision lists (Yarowsky, 1995; Martínez et al., 2002) as their name suggests are simple or- dered lists of rules that are of the form (condition, class, weight). Such rules are usually more easy to understand if thought of as if-then-else rules: if the condition is satisfied then the accord- ing class is assigned. However, in the form of the rules that we provided above there is also a third parameter that is taken into account - weight. Weights are used to determine the order of the rules in the list. Higher weights position the rules higher in the list and respectively lower weights mean that the rules can be found further down in the ordered list of rules. The order in the decision lists is important during classification since the rules are tested sequentially and the first rule that ”succeeds” is used to assign the sense to the example. Usually the default rule in a list is the last one that accepts all remaining cases. Decision trees (Mooney, 1996) are basically very similar to decision lists however not this often used for word sense disambiguation. They as well use classification rules but this time the rules are not ordered in a list but as an n-ary branching tree structure that represents the training set. Hence every branch of the tree represents some rule that is used to test the conjunction of features and to provide a prediction of the class label encoded in the terminal node (also called leaf node is a node in a tree data structure that has no child nodes). Some of the problems with decision trees, which makes them not really handy for WSD, are their computational cost and the data fragmentation (breaking up the data into many pieces that are not close together) that they employ. The latter leads to immense increase in computation if larger feature spaces are used. The same result is also triggered by the use of a large number of examples, however, if fewer training instances are provided a relative decrease in the reliability of the predictions for the class label can be observed. Rule combination for supervised word sense disambiguation means that a set of homoge- neous classification rules is combined and learned only by a single algorithm. AdaBoost (Schapire, 2003) is a very often used rule combination algorithm. It combines mul- tiple classification rules into a single classifier. The power of AdaBoost is based on the fact that the classification rules must not necessarily be very accurate but after they are combined the resulting classifier has an arbitrarily low error rate. Linear Classifier or also called binary classifier achieved in the last few decades considerably low results and thus the highest interest to them was in the field of Information Retrieval. Those kind of classifiers decide on the classification label based on the linear combination of the features in their FVs. They aim to group the instances with the most similar feature values. The limited amount of work on linear classifiers has resulted in several articles as for example (Mooney, 1996; Escudero et al., 2000b; Bartlett et al., 2004; Abdi et al., 1996; Cohen and Singer, 1999). In case a non-linear problem has to be decided, for which the expressivity of the linear classifiers is not enough, suggestions for the use of kernel functions (kernel methods) have been made.

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 28 Kernel-based are the methods that try to find more general types of relations (and not linear as just noted above) in the FVs. Their popularity in the past few years has notably increased which can be seen from their growing participations in recent conferences as Senseval-3 for example (see Section 4.5). Examples of applications of kernel methods in supervised approaches are the ones described by Murata et al. (2001); Boser et al. (1992); Lee and Ng (2002); Cristianini and Shawe-Taylor (2000); Carpuat et al. (2004); Wu et al. (2004); Popescu (2004); Ciaramita and Johnson (2004). One of the most popular kernel-methods is the Support Vector Machines (SVM)s presented by Boser et al. (1992). As Màrquez et al. (2007) report, SVMs are established around the principle of Structural Risk Minimization from the Statistical Learning Theory (Vapnik, 1998). In their basic form SVMs are linear classifiers that view input data as two sets of vectors in an n-dimensional space. They construct a separating hyperplane (in geometry hyperplane is a higher-dimensional abstraction of the concepts of a line in a n-dimensional space) in that space, which is used to separate two data sets. To calculate the margin between those data sets, two parallel hyper- planes are constructed - one on each side of the separating hyperplane, which are directed to the two data sets. Naturally, a good separation is considered to be achieved by the hyperplane that has the largest distance to the neighboring data sets. In cases where non-linear classifiers are desired the selected SVM can be used with a kernel-function. Discourse properties are considered by the Yarowsky bootstrapping algorithm (Yarowsky, 1995). This algorithm is semi-supervised (see Section 2.4) which makes it hardly comparable with the other algorithms in that section but it is considered (Màrquez et al., 2007) relatively important for the following work on bootstrapping for WSD. It uses either automatically or man- ually annotated training data that is supposed to be complete (to represent each of the senses in the set) but not necessarily big. This initially smaller set is used together with a supervised learning algorithm to annotate other examples. If for a given example the annotation is being accomplished with a higher degree of confidence it is added to the ”seed” set and the process is further continued. Similarity-based is a family of methods that are most relevant to our thesis and thus we provide some more in-depth information about them. However, our aim is still the attempt to give an overview of those methods so that a better understanding of our use of them can be accomplished. Approaches of this kind are very often used in supervised WSD because they carry out the disambiguation process in a very simple way. They classify a new example via a similarity metric that compares the latter to previously seen examples and assigns a sense to it - usually this is the MFS in a pool of most similar examples. During the years, probably because of its increased usage, the approach has gained a wide variety of names: instance-based, case-based, similarity-based, example-based, memory-based, exemplar-based, analogical. As a result of the fact that the data is stored in the memory without any restructuring or abstraction

A Machine Learning Approach for Automatic Road Extraction - asprs
Selective Sampling for Example-based Word Sense Disambiguation
Word sense disambiguation with pattern learning and automatic ...
Word Sense Disambiguation Using Automatically Acquired Verbal ...
Using Machine Learning Algorithms for Word Sense Disambiguation ...
Word Sense Disambiguation The problem of WSD - PEOPLE
word sense disambiguation and recognizing textual entailment with ...
MRD-based Word Sense Disambiguation - the Association for ...
KU: Word Sense Disambiguation by Substitution - Deniz Yuret's ...
Using unsupervised word sense disambiguation to ... - INESC-ID
Using Meaning Aspects for Word Sense Disambiguation
Word Sense Disambiguation is Fundamentally Multidimensional
Performance Metrics for Word Sense Disambiguation
Word Sense Disambiguation - cs547pa1
Towards Word Sense Disambiguation of Polish - Proceedings of the ...
Word-Sense Disambiguation for Machine Translation
Unsupervised learning of word sense disambiguation rules ... - CLAIR
Word Sense Disambiguation Using Association Rules: A Survey
Similarity-based Word Sense Disambiguation
Word Sense Disambiguation with Pictures - CiteSeerX
Word Sense Disambiguation Using Selectional Restriction -
Word Sense Disambiguation with Pictures - CLAIR
Using Lexicon Definitions and Internet to Disambiguate Word Senses
Word Sense Disambiguation: An Empirical Survey - International ...
A Comparative Evaluation of Word Sense Disambiguation Algorithms
Semi-supervised Word Sense Disambiguation ... - ResearchGate
NUS-ML: Improving Word Sense Disambiguation Using Topic ...