11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 28<br />

Kernel-based are the methods that try to find more general types <strong>of</strong> relations (and not linear<br />

as just noted above) in the FVs. Their popularity in the past few years has notably increased<br />

which can be seen from their growing participations in recent conferences as <strong>Sense</strong>val-3 <strong>for</strong><br />

example (see Section 4.5). <strong>Examples</strong> <strong>of</strong> applications <strong>of</strong> kernel methods in supervised approaches<br />

are the ones described by Murata et al. (2001); Boser et al. (1992); Lee and Ng (2002); Cristianini<br />

and Shawe-Taylor (2000); Carpuat et al. (2004); Wu et al. (2004); Popescu (2004); Ciaramita and<br />

Johnson (2004).<br />

One <strong>of</strong> the most popular kernel-methods is the Support Vector Machines (SVM)s presented by<br />

Boser et al. (1992). As Màrquez et al. (2007) report, SVMs are established around the principle <strong>of</strong><br />

Structural Risk Minimization from the Statistical Learning Theory (Vapnik, 1998). In their basic<br />

<strong>for</strong>m SVMs are linear classifiers that view input data as two sets <strong>of</strong> vectors in an n-dimensional<br />

space. They construct a separating hyperplane (in geometry hyperplane is a higher-dimensional<br />

abstraction <strong>of</strong> the concepts <strong>of</strong> a line in a n-dimensional space) in that space, which is used to<br />

separate two data sets. To calculate the margin between those data sets, two parallel hyper-<br />

planes are constructed - one on each side <strong>of</strong> the separating hyperplane, which are directed to the<br />

two data sets. Naturally, a good separation is considered to be achieved by the hyperplane that<br />

has the largest distance to the neighboring data sets. In cases where non-linear classifiers are<br />

desired the selected SVM can be used with a kernel-function.<br />

Discourse properties are considered by the Yarowsky bootstrapping algorithm (Yarowsky,<br />

1995). This algorithm is semi-supervised (see Section 2.4) which makes it hardly comparable<br />

with the other algorithms in that section but it is considered (Màrquez et al., 2007) relatively<br />

important <strong>for</strong> the following work on bootstrapping <strong>for</strong> WSD. It uses either automatically or man-<br />

ually annotated training data that is supposed to be complete (to represent each <strong>of</strong> the senses<br />

in the set) but not necessarily big. This initially smaller set is used together with a supervised<br />

learning algorithm to annotate other examples. If <strong>for</strong> a given example the annotation is being<br />

accomplished with a higher degree <strong>of</strong> confidence it is added to the ”seed” set and the process is<br />

further continued.<br />

Similarity-based is a family <strong>of</strong> methods that are most relevant to our thesis and thus we<br />

provide some more in-depth in<strong>for</strong>mation about them. However, our aim is still the attempt to<br />

give an overview <strong>of</strong> those methods so that a better understanding <strong>of</strong> our use <strong>of</strong> them can be<br />

accomplished. Approaches <strong>of</strong> this kind are very <strong>of</strong>ten used in supervised WSD because they<br />

carry out the disambiguation process in a very simple way. They classify a new example via<br />

a similarity metric that compares the latter to previously seen examples and assigns a sense<br />

to it - usually this is the MFS in a pool <strong>of</strong> most similar examples. During the years, probably<br />

because <strong>of</strong> its increased usage, the approach has gained a wide variety <strong>of</strong> names: instance-based,<br />

case-based, similarity-based, example-based, memory-based, exemplar-based, analogical. As a<br />

result <strong>of</strong> the fact that the data is stored in the memory without any restructuring or abstraction

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!