11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 2. BASIC APPROACHES TO WORD SENSE DISAMBIGUATION 21<br />

automatic mapping to the most recent version is usually also provided. Other such resources are:<br />

Hector (Atkins, 1993), Longman Dictionary <strong>of</strong> Contemporary English (Procter, 1978), BalkaNet<br />

(Stamou et al., 2002), etc.<br />

Theoretically, each dictionary could be used as a sense inventory <strong>for</strong> WSD. However, there are<br />

several problems coming along. First, dictionaries are not always freely available <strong>for</strong> research,<br />

which was the reason why <strong>Word</strong>Net became in fact the standard sense inventory <strong>for</strong> the last<br />

decade. However, it is still being argued if it is good as such. Since <strong>Word</strong>Net distinguishes<br />

between the senses <strong>of</strong> each word in an extremely fine-grained manner, it is <strong>of</strong>ten hard to use it<br />

<strong>for</strong> WSD, hence there are cases where a coarser distinction is desirable. Calzolari et al. (2002)<br />

even argue that the use <strong>of</strong> <strong>Word</strong>Net as a sense inventory in WSD yields worse results than using<br />

traditional dictionaries. However, it is not <strong>Word</strong>Net itself but the predefined sense inventory as<br />

such that appears to hinder supervised word sense disambiguation. There is a large number<br />

<strong>of</strong> attempts to solve the latter problem and although none <strong>of</strong> them completely succeed <strong>Word</strong>Net<br />

will continue to be the standard sense inventory <strong>for</strong> WSD.<br />

Another problem in respect to sense inventories is their compatibility. Each dictionary has<br />

its own granularity and representation <strong>of</strong> senses, which are normally extremely hard if not even<br />

impossible to map against each other. Thus systems that use different sense inventories are<br />

impossible to compare, since their per<strong>for</strong>mance is bound to the inventory they use. Of course,<br />

since this issue is a well known problem already, evaluation exercises (see Chapter 4) use a<br />

single sense inventory <strong>for</strong> all the participating systems or it is required that those inventories<br />

that are different from the standard one provide mapping to it.<br />

2.3.2 Source Corpora<br />

One <strong>of</strong> the biggest problems <strong>for</strong> supervised word sense disambiguation (the knowledge acquisi-<br />

tion bottleneck problem) is the fact that there are very few annotated corpora that can be used<br />

in order good and reliable broad-coverage systems to be trained. This is due to the fact that the<br />

creation <strong>of</strong> such corpora requires a highly laborious human ef<strong>for</strong>t. The huge dependency <strong>of</strong> the<br />

method on the provided corpora is the reason why <strong>for</strong> languages other than English, supervised<br />

WSD is extremely difficult if not even impossible. Below follows a brief description <strong>of</strong> the main<br />

data sources <strong>for</strong> supervised WSD.<br />

<strong>Sense</strong>val provides several corpora not only <strong>for</strong> English but as well <strong>for</strong> languages as Ital-<br />

ian, Basque, Catalan, Chinese, Romanian and Spanish. The most recent edition <strong>of</strong> <strong>Sense</strong>val<br />

(<strong>Sense</strong>val-3) resulted in the following annotated corpora:<br />

- English all words - 5000 words were tagged from Penn Treebank text (Marcus et al., 1993)<br />

with <strong>Word</strong>Net as senses.<br />

- English lexical sample - 57 words collected via The Open Mind <strong>Word</strong> Expert interface (Mi-<br />

halcea and Chklovski, 2003) with <strong>Word</strong>Net sense inventory.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!