11.10.2013 Views

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

Automatic Extraction of Examples for Word Sense Disambiguation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 6. AUTOMATIC EXTRACTION OF EXAMPLES FOR WSD 70<br />

advantages but also some disadvantages, which can be clearly seen in the results <strong>of</strong> the experi-<br />

ments and which we will shortly discuss in this section. Since we used automatically prepared<br />

data together with the manually annotated one, we had the chance to observe the following<br />

issues:<br />

The manually annotated data - disadvantages:<br />

• The preparation <strong>of</strong> the data implies an extremely laborious, costly and expensive/intensive<br />

process <strong>of</strong> human annotation, which can hardly be accomplished in a very little time.<br />

• The data is <strong>of</strong> a significantly small amount, which is the effect <strong>of</strong> its costly preparation.<br />

• As a result <strong>of</strong> the manual annotation, the nature <strong>of</strong> the training data is optimally adapted<br />

to the nature <strong>of</strong> the test data.<br />

– proportional distribution <strong>of</strong> occurring senses<br />

– very similar length <strong>of</strong> the context<br />

– cover only a subset <strong>of</strong> the actual senses <strong>of</strong> the word<br />

• The above mentioned issue leaves hardly any possibility <strong>for</strong> improvement, except by a<br />

carefully and precisely filtered external data.<br />

• The preparation <strong>of</strong> such data implies also predefined semantic resources, which:<br />

– are not available in all languages<br />

– differ considerably in the quality<br />

– provide granularity <strong>of</strong> senses <strong>of</strong>ten different than the one needed <strong>for</strong> the task<br />

The manually annotated data - advantages:<br />

• As a result <strong>of</strong> its careful preparation the manually annotated data leads to exceptionally<br />

good system per<strong>for</strong>mance, outper<strong>for</strong>ming unsupervised and semi-supervised methods.<br />

• Employs only well selected senses that are predefined in the training data.<br />

• Has an extremely homogeneous nature, which ensures good per<strong>for</strong>mance <strong>of</strong> the system.<br />

The automatically annotated data - disadvantages:<br />

• Used without any additional processing could lead to poor per<strong>for</strong>mance <strong>of</strong> the WSD system.<br />

• The derived senses <strong>for</strong> the target word are specific <strong>for</strong> the corpus they are extracted from.<br />

• Is never as homogeneous as manually prepared data and thus can lead to poorer results.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!