Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
But it is not <strong>the</strong> same <strong>for</strong> <strong>the</strong> verbs. Every verb has two stems <strong>for</strong> each voice<br />
(passive and active); <strong>the</strong> “present stem” and <strong>the</strong> “past stem”. The tenses are<br />
<strong>for</strong>med according to <strong>the</strong>se stems. Moreover, <strong>the</strong> verbs starting with a consonant<br />
take <strong>the</strong> letter “ε” in some <strong>of</strong> <strong>the</strong> past tenses. The various <strong>for</strong>mations <strong>of</strong> <strong>the</strong> verb<br />
“∆ΕΝΩ” (tide) presented on <strong>the</strong> Table 5.<br />
Tenses Verb Stems Suffixes<br />
∆ΕΝΩ<br />
(I tide)<br />
∆ΕΝ Ω<br />
ΘΑ ∆ΕΝΩ ∆ΕΝ Ω<br />
Present Tenses (I will be tiding)<br />
∆ΕΝΕ<br />
(tide)<br />
∆ΕΝ Ε<br />
∆ΕΝΟΝΤΑΣ<br />
(tiding)<br />
∆ΕΝ ΟΝΤΑΣ<br />
Ε∆ΕΣΑ<br />
(I tided)<br />
∆ΕΣ Α<br />
ΝΑ ∆ΕΣΩ<br />
(to tide)<br />
∆ΕΣ Ω<br />
Past Tenses ΝΑ ∆ΕΣΩ<br />
(to tide)<br />
∆ΕΣ Ω<br />
∆ΕΣΕ<br />
(tide)<br />
∆ΕΣ Ε<br />
∆ΕΣΕΙ<br />
(I have tide)<br />
∆ΕΣ ΕΙ<br />
Table 5: Different stems <strong>of</strong> verbs<br />
2.6 Agreements <strong>for</strong> <strong>the</strong> <strong>Greek</strong> <strong>Stemmer</strong><br />
Be<strong>for</strong>e we continue with <strong>the</strong> presentation <strong>of</strong> <strong>the</strong> stemming algorithm it is<br />
important to note <strong>the</strong> agreements that took place during this stemmer development.<br />
As <strong>the</strong> <strong>Greek</strong> language is highly inflectional, <strong>the</strong> suffix removal process could be<br />
very time-consuming and <strong>the</strong> algorithm could be very complicated using long<br />
corpus and many exceptions. On <strong>the</strong> o<strong>the</strong>r hand, if we have too general rules we<br />
may reduce two semantically different words to <strong>the</strong> same root (overstemming).<br />
The algorithm works <strong>for</strong> <strong>the</strong> <strong>Greek</strong> words in capital letters as we mentioned in <strong>the</strong><br />
paragraph 1.6 on <strong>the</strong> limitations <strong>of</strong> <strong>the</strong> system. Using characters in upper case we<br />
16