11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 27<br />

Decision lists (Yarowsky, 1995; Martínez et al., 2002) as their name suggests are simple or-<br />

dered lists <strong>of</strong> rules that are <strong>of</strong> the <strong>for</strong>m (condition, class, weight). Such rules are usually more<br />

easy to understand if thought <strong>of</strong> as if-then-else rules: if the condition is satisfied then the accord-<br />

ing class is assigned. However, in the <strong>for</strong>m <strong>of</strong> the rules that we provided above there is also a<br />

third parameter that is taken into account - weight. Weights are used to determine the order <strong>of</strong><br />

the rules in the list. Higher weights position the rules higher in the list and respectively lower<br />

weights mean that the rules can be found further down in the ordered list <strong>of</strong> rules. The order<br />

in the decision lists is important during classification since the rules are tested sequentially and<br />

the first rule that ”succeeds” is used to assign the sense to the example. Usually the default rule<br />

in a list is the last one that accepts all remaining cases.<br />

Decision trees (Mooney, 1996) are basically very similar to decision lists however not this <strong>of</strong>ten<br />

used <strong>for</strong> word sense disambiguation. They as well use classification rules but this time the rules<br />

are not ordered in a list but as an n-ary branching tree structure that represents the training<br />

set. Hence every branch <strong>of</strong> the tree represents some rule that is used to test the conjunction <strong>of</strong><br />

features and to provide a prediction <strong>of</strong> the class label encoded in the terminal node (also called<br />

leaf node is a node in a tree data structure that has no child nodes). Some <strong>of</strong> the problems with<br />

decision trees, which makes them not really handy <strong>for</strong> WSD, are their computational cost and<br />

the data fragmentation (breaking up the data into many pieces that are not close together) that<br />

they employ. The latter leads to immense increase in computation if larger feature spaces are<br />

used. The same result is also triggered by the use <strong>of</strong> a large number <strong>of</strong> examples, however, if<br />

fewer training instances are provided a relative decrease in the reliability <strong>of</strong> the predictions <strong>for</strong><br />

the class label can be observed.<br />

Rule combination <strong>for</strong> supervised word sense disambiguation means that a set <strong>of</strong> homoge-<br />

neous classification rules is combined and learned only by a single algorithm.<br />

AdaBoost (Schapire, 2003) is a very <strong>of</strong>ten used rule combination algorithm. It combines mul-<br />

tiple classification rules into a single classifier. The power <strong>of</strong> AdaBoost is based on the fact that<br />

the classification rules must not necessarily be very accurate but after they are combined the<br />

resulting classifier has an arbitrarily low error rate.<br />

Linear Classifier or also called binary classifier achieved in the last few decades considerably<br />

low results and thus the highest interest to them was in the field <strong>of</strong> In<strong>for</strong>mation Retrieval.<br />

Those kind <strong>of</strong> classifiers decide on the classification label based on the linear combination <strong>of</strong> the<br />

features in their FVs. They aim to group the instances with the most similar feature values.<br />

The limited amount <strong>of</strong> work on linear classifiers has resulted in several articles as <strong>for</strong> example<br />

(Mooney, 1996; Escudero et al., 2000b; Bartlett et al., 2004; Abdi et al., 1996; Cohen and Singer,<br />

1999). In case a non-linear problem has to be decided, <strong>for</strong> which the expressivity <strong>of</strong> the linear<br />

classifiers is not enough, suggestions <strong>for</strong> the use <strong>of</strong> kernel functions (kernel methods) have been<br />

made.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!