Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
Development of a Stemmer for the Greek Language - SAIS
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Of course <strong>the</strong>re are suffixes which do not conflict and can easily be removed. For<br />
example if we set a rule that removes <strong>the</strong> suffix “ΑΣ” <strong>for</strong> any given word, <strong>the</strong>re is<br />
no conflict with o<strong>the</strong>r words. This general rule covers a big amount on words in<br />
<strong>the</strong> <strong>Greek</strong> language.<br />
Finally, <strong>the</strong> rules, generic or specific, are applied each time <strong>for</strong> <strong>the</strong> longest<br />
possible suffix in <strong>the</strong> list. So when we have <strong>the</strong> suffixes “Α” and “ΑΤΑ” in <strong>the</strong><br />
suffixes list, <strong>the</strong> word “ΚΥΜΑΤΑ” (waves) will be reduced on <strong>the</strong> stem “ΚΥΜ”<br />
and not “ΚΥΜΑΤ”.<br />
3.2 The rules<br />
Trying to deal with each suffix individually, we have created a decentralized<br />
algorithm. The different rules are presented below in pseudo-code:<br />
Rule-set 1<br />
if (word ends on Α∆ΕΣ|Α∆ΩΝ){<br />
remove <strong>the</strong> suffix;<br />
if (remaining part does not end on ΟΚ|ΜΑΜ|ΜΑΝ…){<br />
add “Α∆”;<br />
}<br />
}<br />
The rule removes <strong>the</strong> suffixes Α∆ΕΣ and Α∆ΩΝ <strong>for</strong> a group <strong>of</strong> words.<br />
Example:<br />
ΓΙΑΓΙΑ ΓΙΑΓΙ<br />
ΓΙΑΓΙΑ∆ΩΝ ΓΙΑΓΙ<br />
The rule doesn’t affect <strong>the</strong> group <strong>of</strong> words that by chance have similar suffixes.<br />
Example:<br />
ΟΜΑ∆Α ΟΜΑ∆<br />
ΟΜΑ∆ΕΣ ΟΜΑ∆<br />
Rule-set 2<br />
if (word ends on Ε∆ΕΣ|Ε∆ΩΝ){<br />
remove <strong>the</strong> suffix;<br />
if (remaining part ends on ΟΠ|ΙΠ|ΕΜΠ…){<br />
add “Ε∆”;<br />
}<br />
}<br />
19