Views
4 years ago

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

CHAPTER 6. AUTOMATIC

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 69 word. Be that as it may, looking closely at Tables 7.1 and 7.4 we can note a variation of up to 4% as a result of those very few added examples. This means that even with very small number of instances, which are carefully selected out of a huge pool of examples we gain an exceptionally big variation and in some cases separate word-experts improve with up to 4% of accuracy. feature selection coarse P R fine P R all features 67.0 67.0 62.3 62.3 forward 76.9 76.9 72.4 72.4 backward 73.0 73.0 68.9 68.9 MFS 64.5 64.5 55.2 55.2 best performance 78.5 78.5 73.9 73.9 Table 6.13: System performance on automatically annotated data without optimization of the k parameter and distance and distribution filtering mixed with the manually annotated training set. feature selection coarse P R fine P R all features 70.5 70.5 64.9 64.9 forward 78.7 78.7 74.0 74.0 backward 77.6 77.6 73.0 73.0 MFS 64.5 64.5 55.2 55.2 best performance 79.4 79.4 75.0 75.0 Table 6.14: System performance on automatically annotated data with optimization of the k parameter and distance and distribution filtering mixed with the manually annotated training set. Of course, extending this training set further on manually will result to a lot of very expensive human work. Our approach, however, can be used for this purpose. As we saw, from our original automatically collected training set of about 170 000 instance we managed to extract only 141 examples that are at the shortest similarity distance which bring improvement of the accuracy. Moreover, those 141 examples, even if so similar to the ones we already have always bring new information along. Consequently, this new information makes the classifier less sensitive to new examples. As a result we can conclude that adding automatically acquired examples to the manually prepared data set helps and that our approach reached such good results via an easy and very computationally ”cheap” method for gaining new instances for training of WSD systems. 6.8.4 Discussion The experiments we conducted and described in the previous few sections show that our system performs competitive to other state-of-art systems. The data that we used, however, has several

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 70 advantages but also some disadvantages, which can be clearly seen in the results of the experi- ments and which we will shortly discuss in this section. Since we used automatically prepared data together with the manually annotated one, we had the chance to observe the following issues: The manually annotated data - disadvantages: • The preparation of the data implies an extremely laborious, costly and expensive/intensive process of human annotation, which can hardly be accomplished in a very little time. • The data is of a significantly small amount, which is the effect of its costly preparation. • As a result of the manual annotation, the nature of the training data is optimally adapted to the nature of the test data. – proportional distribution of occurring senses – very similar length of the context – cover only a subset of the actual senses of the word • The above mentioned issue leaves hardly any possibility for improvement, except by a carefully and precisely filtered external data. • The preparation of such data implies also predefined semantic resources, which: – are not available in all languages – differ considerably in the quality – provide granularity of senses often different than the one needed for the task The manually annotated data - advantages: • As a result of its careful preparation the manually annotated data leads to exceptionally good system performance, outperforming unsupervised and semi-supervised methods. • Employs only well selected senses that are predefined in the training data. • Has an extremely homogeneous nature, which ensures good performance of the system. The automatically annotated data - disadvantages: • Used without any additional processing could lead to poor performance of the WSD system. • The derived senses for the target word are specific for the corpus they are extracted from. • Is never as homogeneous as manually prepared data and thus can lead to poorer results.

A Machine Learning Approach for Automatic Road Extraction - asprs
Selective Sampling for Example-based Word Sense Disambiguation
Word sense disambiguation with pattern learning and automatic ...
Word Sense Disambiguation Using Automatically Acquired Verbal ...
Using Machine Learning Algorithms for Word Sense Disambiguation ...
Word Sense Disambiguation The problem of WSD - PEOPLE
Performance Metrics for Word Sense Disambiguation
BOOTSTRAPPING IN WORD SENSE DISAMBIGUATION
Word Sense Disambiguation - cs547pa1
WORD SENSE DISAMBIGUATION - Leffa
word sense disambiguation and recognizing textual entailment with ...
MRD-based Word Sense Disambiguation - the Association for ...
Word Sense Disambiguation Using Selectional Restriction - Ijsrp.org
Using unsupervised word sense disambiguation to ... - INESC-ID
KU: Word Sense Disambiguation by Substitution - Deniz Yuret's ...
Using Meaning Aspects for Word Sense Disambiguation
Using Lexicon Definitions and Internet to Disambiguate Word Senses
Word Sense Disambiguation is Fundamentally Multidimensional
A Comparative Evaluation of Word Sense Disambiguation Algorithms
Semi-supervised Word Sense Disambiguation ... - ResearchGate
Word Sense Disambiguation: An Empirical Survey - International ...
Towards Word Sense Disambiguation of Polish - Proceedings of the ...
Word-Sense Disambiguation for Machine Translation
Word Sense Disambiguation Using Association Rules: A Survey
Unsupervised learning of word sense disambiguation rules ... - CLAIR
Similarity-based Word Sense Disambiguation
Word Sense Disambiguation with Pictures - CiteSeerX
Word Sense Disambiguation with Pictures - CLAIR