4 years ago

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation


CHAPTER 5. TIMBL: TILBURG MEMORY-BASED LEARNER 45 installation is considerably straightforward on the majority of UNIX-based systems. Originally TiMBL was designed to be the solution for the linguistic classification task, how- ever, it can be exploited for any alternative categorization task with appropriate (symbolic or numeric) features and discrete (non-continuous) classes for which training data is available. The latter again leads us to the already discussed topic on the acute shortage of labeled data. 5.2 Application As we have mentioned above the training data for WSD, namely the data that TiMBL uses in the process of learning is represented by feature vectors of which the exact structure is shown in Section 2.3.4. The format of the feature files is flexible, since TiMBL is able to guess the type of format in most of the cases. However, we will stick to the most often used format - feature vectors are features separated by spaces on a single line. As an example let us consider the situation in which we have a training set (delivered to TiMBL as the file data.train) and a test set (data.test). After running the tool as follows: > Timbl -f data.train -t data.test TiMBL returns a new file, which consists basically of the data in our test file data.test. However, the system adds a new feature to each FV, which rep- resents the new class that it has predicted for the vector. The experiment is conducted with the default parameters for the systems and the results are sent to standard output (or if needed are written in a separate data file). For a more detailed information on the format and information of the results, refer to (Daelemans et al., 2007). The name of the output file consists of the most important information for the conducted experiment. The first two parts represent the name of the test file that was used for the analysis (data.test) and together with .out it is referred to the output file of the experiment; IB1 represents the machine-based learning algorithm that was employed - the k-NN algorithm in this particular case; O stands for the similarity computed with weighted overlap; gr means that the relevance weights were computed with gain ratio and finally k1 represents the number of most similar patterns in the memory on which the output label was based. If those default settings are the ones one needs for the planned experiment, there is not much more to do. However, when we talked about supervised WSD methods we mentioned multiple algorithms that could be employed for the purpose and TiMBL supports a good selection of them. Another extremely wide range of possibilities is connected with the distance metrics that can be used with TiMBL in order to determine the similarity between the different instances. All different options can be specified directly on the command line before running an experiment with TiMBL. > Timbl -k 3 -f data.train -t data.test

CHAPTER 5. TIMBL: TILBURG MEMORY-BASED LEARNER 46 This command for instance will run the latter experiment. However this time a different number of nearest neighbors will be used for extrapolation. Normally the default value is 1, thus if anything else is needed it must be specified explicitly. A very important for us option, which we use further in our work is the +v n (verbosity) option. It allows us to output the nearest neighbors on which decisions are based. Daelemans et al. (1999) comprehensively describe all options and their possible value ranges that can be chosen.

A Machine Learning Approach for Automatic Road Extraction - asprs
Selective Sampling for Example-based Word Sense Disambiguation
Word sense disambiguation with pattern learning and automatic ...
Word Sense Disambiguation Using Automatically Acquired Verbal ...
Using Machine Learning Algorithms for Word Sense Disambiguation ...
Word Sense Disambiguation The problem of WSD - PEOPLE
Performance Metrics for Word Sense Disambiguation
Word Sense Disambiguation - cs547pa1
Word Sense Disambiguation Using Selectional Restriction -
word sense disambiguation and recognizing textual entailment with ...
MRD-based Word Sense Disambiguation - the Association for ...
Using Lexicon Definitions and Internet to Disambiguate Word Senses
Using unsupervised word sense disambiguation to ... - INESC-ID
KU: Word Sense Disambiguation by Substitution - Deniz Yuret's ...
A Comparative Evaluation of Word Sense Disambiguation Algorithms
Semi-supervised Word Sense Disambiguation ... - ResearchGate
Word Sense Disambiguation is Fundamentally Multidimensional
Using Meaning Aspects for Word Sense Disambiguation
Word Sense Disambiguation: An Empirical Survey - International ...
Towards Word Sense Disambiguation of Polish - Proceedings of the ...
Word-Sense Disambiguation for Machine Translation
Unsupervised learning of word sense disambiguation rules ... - CLAIR
Word Sense Disambiguation Using Association Rules: A Survey
Similarity-based Word Sense Disambiguation
Word Sense Disambiguation with Pictures - CiteSeerX
Word Sense Disambiguation with Pictures - CLAIR