22.01.2013 Views

Development of a Stemmer for the Greek Language - SAIS

Development of a Stemmer for the Greek Language - SAIS

Development of a Stemmer for the Greek Language - SAIS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Of course <strong>the</strong>re are suffixes which do not conflict and can easily be removed. For<br />

example if we set a rule that removes <strong>the</strong> suffix “ΑΣ” <strong>for</strong> any given word, <strong>the</strong>re is<br />

no conflict with o<strong>the</strong>r words. This general rule covers a big amount on words in<br />

<strong>the</strong> <strong>Greek</strong> language.<br />

Finally, <strong>the</strong> rules, generic or specific, are applied each time <strong>for</strong> <strong>the</strong> longest<br />

possible suffix in <strong>the</strong> list. So when we have <strong>the</strong> suffixes “Α” and “ΑΤΑ” in <strong>the</strong><br />

suffixes list, <strong>the</strong> word “ΚΥΜΑΤΑ” (waves) will be reduced on <strong>the</strong> stem “ΚΥΜ”<br />

and not “ΚΥΜΑΤ”.<br />

3.2 The rules<br />

Trying to deal with each suffix individually, we have created a decentralized<br />

algorithm. The different rules are presented below in pseudo-code:<br />

Rule-set 1<br />

if (word ends on Α∆ΕΣ|Α∆ΩΝ){<br />

remove <strong>the</strong> suffix;<br />

if (remaining part does not end on ΟΚ|ΜΑΜ|ΜΑΝ…){<br />

add “Α∆”;<br />

}<br />

}<br />

The rule removes <strong>the</strong> suffixes Α∆ΕΣ and Α∆ΩΝ <strong>for</strong> a group <strong>of</strong> words.<br />

Example:<br />

ΓΙΑΓΙΑ ΓΙΑΓΙ<br />

ΓΙΑΓΙΑ∆ΩΝ ΓΙΑΓΙ<br />

The rule doesn’t affect <strong>the</strong> group <strong>of</strong> words that by chance have similar suffixes.<br />

Example:<br />

ΟΜΑ∆Α ΟΜΑ∆<br />

ΟΜΑ∆ΕΣ ΟΜΑ∆<br />

Rule-set 2<br />

if (word ends on Ε∆ΕΣ|Ε∆ΩΝ){<br />

remove <strong>the</strong> suffix;<br />

if (remaining part ends on ΟΠ|ΙΠ|ΕΜΠ…){<br />

add “Ε∆”;<br />

}<br />

}<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!