Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
A parallel corpus stemmer is language independent and it has successfully been<br />
used by o<strong>the</strong>r researchers (Yarowsky 2000, Diab & Resnik 2002).<br />
1.1.6 <strong>Greek</strong> <strong>Stemmer</strong>s<br />
On 2001, Tambouratzis and Carayannis (Tambouratzis & Carayannis 2001)<br />
presented a system that per<strong>for</strong>ms an automated morphological categorization <strong>of</strong><br />
<strong>Greek</strong> words extracted from a corpus, <strong>for</strong> <strong>the</strong> Institute <strong>for</strong> <strong>Language</strong> and Speech<br />
Processing (ILSP) in Greece. The aim <strong>of</strong> <strong>the</strong> Automated Morphological Processor<br />
(AMP), whose structure is outlined in Figure 2, is to per<strong>for</strong>m <strong>the</strong> segmentation <strong>of</strong><br />
a given set <strong>of</strong> words into stems and endings in an automated manner. The<br />
algorithm is using rule-based iterative matching-and-masking approach, which<br />
relies on matching parts <strong>of</strong> different patterns. Here <strong>the</strong> stemming process is based<br />
on an initial set <strong>of</strong> valid stems and endings. There is also an assumption that each<br />
word consists <strong>of</strong> a stem part and an ending part excluding <strong>the</strong> compound words.<br />
Figure 2: Automated Morphological Processor (AMP) overview<br />
5