5 years ago

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation


CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 17 application (since as noted above, it is often the case that the sense inventories provided by dic- tionaries, WordNet, or other sources could be either too fine-grained or not fine-grained enough) or automatic derivation of bilingual dictionaries from parallel corpora (which again gives the opportunity of the dictionaries to be more specific to the given domain). With the rapidly increasing use of large corpora, which are nowadays permanently available and easily extracted from the World Wide Web, unsupervised corpus-based methods have become more and more interesting to the computational linguistic society. Their application does not necessarily require a lot of linguistic knowledge, which makes them flexible in respect to the variety of languages to which they can be applied. Linguistic knowledge, however, naturally contributes a lot to the power and robustness of the unsupervised methods which is a direction in which they could be further developed, especially for the languages for which such knowledge already exists. In general unsupervised corpus-based algorithms, as Mihalcea et al. (2004a) report, perform poorer than supervised or knowledge-based algorithms. However, as can be seen in Table 2.2 the results presented in evaluation exercises (e.g. Senseval - see Chapter 4 ) become more competi- tive than they were in previous years. Table 2.2: Performance and short description for the unsupervised systems participating in the SENSEVAL-3 English lexical sample task. Precision (P) and recall (R) (see Section 4.1) figures are provided for both fine-grained and coarse-grained scoring (Mihalcea et al., 2004a).

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 18 2.3 Supervised Corpus-Based A considerable part of our approach is based on supervised corpus-based methods which is the reason why we will briefly discuss them as well. However, we will only present the information that is most relevant to our work. Supervised corpus-based methods for word sense disambiguation in respect to the unsuper- vised ones are considerably more expensive in regard to the human work that needs to be in- vested in them. This work consists basically of semantic annotation of examples which for a domain independent application, as Ng (1997b) suggests, need to be at least 1000 instances of 3200 different words. In their earlier work Ng and Lee (1996) also report that the necessary human effort for the creation of a corpus of this size is about 16 human-years. Unquestionably, this is a prize quite hight to pay and of course this leads to consequences such as the knowledge acquisition bottleneck or in other words the lack of semantically annotated instances. This un- equivocally illustrates the biggest predicament for supervised corpus-based methods but fails to show their effectiveness. In the last decade the methods of this family proved to be more suc- cessful than unsupervised or knowledge based ones for which evaluation exercises (see Chapter 4) provide us with evidence. As can be seen in Table 2.3 on page 19 supervised systems reach up to 79.3% accuracy (the accuracy of a system represents its overall performance - exactly how it is measured is described in Section 4.1) which is already a good distance from the MFS classifier baseline that gives us only about 55.2% for the data in the given experiment. The idea behind supervised methods for WSD is directly connected with the use of machine learning for classification. Those methods automatically learn to make correct predictions as long as they are provided the possibility to have some observations in advance. There are several advantages of the automated attempt for classification: it is most often much more accurate than human-crafted rules because it is data-driven; its flexibility enables its application on variable training data; no extra effort is needed for the creation of additional classifiers, etc. Along with the good aspects, there are some downsides of the supervised machine learning classification - the biggest one, as we just mentioned, is the fact that it depends on huge amounts of annotated training data. The sequence of actions that supervised machine learning methods employ is visualized by Figure 2.1 on page 20.

A Machine Learning Approach for Automatic Road Extraction - asprs
Selective Sampling for Example-based Word Sense Disambiguation
Word sense disambiguation with pattern learning and automatic ...
Using Machine Learning Algorithms for Word Sense Disambiguation ...
Word Sense Disambiguation Using Automatically Acquired Verbal ...
Word Sense Disambiguation The problem of WSD - PEOPLE
Performance Metrics for Word Sense Disambiguation
Word Sense Disambiguation - cs547pa1
Word Sense Disambiguation Using Selectional Restriction -
Using Lexicon Definitions and Internet to Disambiguate Word Senses
KU: Word Sense Disambiguation by Substitution - Deniz Yuret's ...
word sense disambiguation and recognizing textual entailment with ...
MRD-based Word Sense Disambiguation - the Association for ...
A Comparative Evaluation of Word Sense Disambiguation Algorithms
Semi-supervised Word Sense Disambiguation ... - ResearchGate
Word Sense Disambiguation: An Empirical Survey - International ...
Using unsupervised word sense disambiguation to ... - INESC-ID
Word Sense Disambiguation is Fundamentally Multidimensional
Towards Word Sense Disambiguation of Polish - Proceedings of the ...
Unsupervised learning of word sense disambiguation rules ... - CLAIR
Word-Sense Disambiguation for Machine Translation
Word Sense Disambiguation with Pictures - CiteSeerX
Using Meaning Aspects for Word Sense Disambiguation
Soft Word Sense Disambiguation
NUS-ML: Improving Word Sense Disambiguation Using Topic ...
A Word-Sense Disambiguated Multilingual Wikipedia Corpus - UPC