4 years ago

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation


CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 61 course, this is good especially for fully supervised approaches, since then the training data is considerably less and exceptional cases can be represented only with a few instances. A negative feature, however, is the fact that each instance is taken into account with all its features present in the FV. Since memory-based learners are extremely sensitive to irrelevant features this leads to a significant decrease in accuracy whenever such features are present. For this purpose we use a forward-backward algorithm similar to (Dinu and Kübler, 2007) or (Mihalcea, 2002) which improves the performance of a word-expert by selecting a subset of features from the provided FV for the targeted word. This subset has less but more relevant for the disambiguation process features. The exact forward (9) and backward (10) algorithms that we used, following Dinu and Kübler (2007) are as follows: (9) function ForwardAutomaticFeatureSelection 1: generate a pool of features PF = {Fi} 2: initialize the set of selected features with the empty set SF = {Ø} 3: extract training and testing corpora for the given target ambiguous word 4: repeat 5: for each feature Fi ∈ PF do 6: run a disambiguation experiment on the training set; each example in the training set contains the features in SF and the feature Fi 7: determine the feature Fi leading to the best accuracy 8: remove Fi from PF and add it to SF 9: end for 10: until no improvements are obtained (10) function BackwardAutomaticFeatureSelection 1: generate a pool of features PF = {Fi} 2: extract training and testing corpora for the given target ambiguous word 3: repeat 4: for each PF ∩ Fi do 5: run a disambiguation experiment on the training set 6: determine the feature Fi leading to the worst accuracy 7: remove Fi from PF 8: end for 9: until improvements are obtained or the accuracy remains stable

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 62 6.7 Scoring Having trained a classifier and tested it on the provided examples, our system returns answers for each of the instances in the test set. Those answers, however, need to be checked and for this purpose Senseval-3 provides a special scoring software (scorer2 13 ), developed in C. We scored our system for precision and recall against two of the three provided scoring schemes - fine-grained (the answers of the system need to match exactly according to the correct answers from the scorer) and coarse-grained (the answers are compared to the coarse-grained senses) (see Section 4.1). The scoring software expects a specific format for the answers, which does not correspond to the output file from the memory-based learner. More precisely the scorer expects that each line contains the id for the lexical item (this is the value of the attribute item in the tag), the id for the instance (which is the value of the id attribute in the tag), all possible answers of the system (in the same vector) and if needed a comment. All separate parts of the answer vector need to be separated by spaces. Scorer2 ignores all answers that have already been scored (in case more than one answer vector is provided for an instance). A very interesting feature of the software is the fact that it allows the use of weights (probabilities) for the answers. Again for clarification, let us consider our toy example and how exactly the correct answer for it should look like. First we need the reference id for the lexical item, which in our case is activate.v. Second, the reference id for the instance - activate.v.cald.12345678 and then the answer - 38201, which all together looks like: (11) activate.v activate.v.cald.12345678 38201 6.8 Experimental Results In order to get a good overview of the strengths and disadvantages of our system as well as a good assessment of the automatically added data to it we conducted several different experiments. We started with training word-experts only on the manually annotated instances (see Section 6.8.1), which shows how good the system is designed and gives us a basis to compare it to other already existing ones that were reported in the Senseval-3 competition. However, since our main aim is to examine the instances that we automatically gathered, we trained word-experts only on them as well (Section 6.8.2), or used them to extend the already existing Senseval-3 corpus (Section 6.8.3). 13∼rada/senseval/senseval3/scoring/scoring.tar.gz

A Machine Learning Approach for Automatic Road Extraction - asprs
Selective Sampling for Example-based Word Sense Disambiguation
Word sense disambiguation with pattern learning and automatic ...
Using Machine Learning Algorithms for Word Sense Disambiguation ...
Word Sense Disambiguation Using Automatically Acquired Verbal ...
Word Sense Disambiguation The problem of WSD - PEOPLE
Performance Metrics for Word Sense Disambiguation
Word Sense Disambiguation - cs547pa1
Word Sense Disambiguation Using Selectional Restriction -
Using Lexicon Definitions and Internet to Disambiguate Word Senses
A Comparative Evaluation of Word Sense Disambiguation Algorithms
KU: Word Sense Disambiguation by Substitution - Deniz Yuret's ...
Semi-supervised Word Sense Disambiguation ... - ResearchGate
word sense disambiguation and recognizing textual entailment with ...
MRD-based Word Sense Disambiguation - the Association for ...
Using unsupervised word sense disambiguation to ... - INESC-ID
Word Sense Disambiguation: An Empirical Survey - International ...
Word Sense Disambiguation is Fundamentally Multidimensional
Towards Word Sense Disambiguation of Polish - Proceedings of the ...
Unsupervised learning of word sense disambiguation rules ... - CLAIR
Word-Sense Disambiguation for Machine Translation
Word Sense Disambiguation with Pictures - CiteSeerX
Using Meaning Aspects for Word Sense Disambiguation
Soft Word Sense Disambiguation
NUS-ML: Improving Word Sense Disambiguation Using Topic ...
Word sense disambiguation and information retrieval - CiteSeerX