11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 71<br />

The automatically annotated data - advantages:<br />

• Used selectively according to various criteria (which we discussed in our work) can be ex-<br />

tremely valuable and employed <strong>for</strong> improvement <strong>of</strong> supervised WSD systems.<br />

• Resources can be created with considerably small ef<strong>for</strong>t.<br />

• Possibility to extract data <strong>for</strong> all languages that can provide corpora.<br />

• The granularity <strong>of</strong> senses can be controlled by the choice <strong>of</strong> corpora.<br />

• Can provide a big amount <strong>of</strong> examples.<br />

All those advantages and disadvantages lead to the conclusion, that automatically vs. man-<br />

ually annotated data compete mostly in respect to the invested ef<strong>for</strong>t in creation <strong>of</strong> the data vs.<br />

the per<strong>for</strong>mance <strong>of</strong> the final system. However, in our work we showed that automatically anno-<br />

tated data can be used with several different purposes, which can achieve good results and thus<br />

be considered important.<br />

In order to visualize better the difference between automatically annotated data and the<br />

manually prepared one we looked at the system per<strong>for</strong>mance on three different randomly chosen<br />

words (one noun, one verb and one adjective) from the lexical sample. We started with a training<br />

set consisting <strong>of</strong> only 10 randomly chosen instances and gradually added sets <strong>of</strong> new 10 examples<br />

and observed the results. The curves can be seen in Figure 7.1, Figure 7.2 and Figure 7.3 on page<br />

93 and 95. What can be seen on the graphs is that gradual addition <strong>of</strong> instances generally lead<br />

to an increase in accuracy. Even though that the curves <strong>for</strong> the automatically annotated data<br />

are so far below the manually annotated one, which can be as well seen in the poor per<strong>for</strong>mance<br />

<strong>of</strong> our unsupervised experiment (see Table 6.9) we already showed, that this per<strong>for</strong>mance can<br />

be easily improved (refer to Table 6.12). Moreover, the constant increase <strong>of</strong> corpora ensures the<br />

fact that more instances can be easily extracted. But, <strong>of</strong> course, how many are enough will still<br />

stay an open question in the field. However, if we recall the interesting and extreme case <strong>of</strong> solid<br />

or as well other words as lose, talk and treat we see that the big number <strong>of</strong> examples does not<br />

always lead to good results. This is so, because as we noted the quality <strong>of</strong> those examples is, too,<br />

exceptionally important.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!