11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 69<br />

word. Be that as it may, looking closely at Tables 7.1 and 7.4 we can note a variation <strong>of</strong> up to 4%<br />

as a result <strong>of</strong> those very few added examples. This means that even with very small number <strong>of</strong><br />

instances, which are carefully selected out <strong>of</strong> a huge pool <strong>of</strong> examples we gain an exceptionally<br />

big variation and in some cases separate word-experts improve with up to 4% <strong>of</strong> accuracy.<br />

feature selection<br />

coarse<br />

P R<br />

fine<br />

P R<br />

all features 67.0 67.0 62.3 62.3<br />

<strong>for</strong>ward 76.9 76.9 72.4 72.4<br />

backward 73.0 73.0 68.9 68.9<br />

MFS 64.5 64.5 55.2 55.2<br />

best per<strong>for</strong>mance 78.5 78.5 73.9 73.9<br />

Table 6.13: System per<strong>for</strong>mance on automatically annotated data without optimization <strong>of</strong> the k<br />

parameter and distance and distribution filtering mixed with the manually annotated training<br />

set.<br />

feature selection<br />

coarse<br />

P R<br />

fine<br />

P R<br />

all features 70.5 70.5 64.9 64.9<br />

<strong>for</strong>ward 78.7 78.7 74.0 74.0<br />

backward 77.6 77.6 73.0 73.0<br />

MFS 64.5 64.5 55.2 55.2<br />

best per<strong>for</strong>mance 79.4 79.4 75.0 75.0<br />

Table 6.14: System per<strong>for</strong>mance on automatically annotated data with optimization <strong>of</strong> the k<br />

parameter and distance and distribution filtering mixed with the manually annotated training<br />

set.<br />

Of course, extending this training set further on manually will result to a lot <strong>of</strong> very expensive<br />

human work. Our approach, however, can be used <strong>for</strong> this purpose. As we saw, from our original<br />

automatically collected training set <strong>of</strong> about 170 000 instance we managed to extract only 141<br />

examples that are at the shortest similarity distance which bring improvement <strong>of</strong> the accuracy.<br />

Moreover, those 141 examples, even if so similar to the ones we already have always bring new<br />

in<strong>for</strong>mation along. Consequently, this new in<strong>for</strong>mation makes the classifier less sensitive to<br />

new examples. As a result we can conclude that adding automatically acquired examples to<br />

the manually prepared data set helps and that our approach reached such good results via an<br />

easy and very computationally ”cheap” method <strong>for</strong> gaining new instances <strong>for</strong> training <strong>of</strong> WSD<br />

systems.<br />

6.8.4 Discussion<br />

The experiments we conducted and described in the previous few sections show that our system<br />

per<strong>for</strong>ms competitive to other state-<strong>of</strong>-art systems. The data that we used, however, has several

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!