25.08.2013 Views

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 8: Sample Results of Noun Guessing for Classes 7 and 8<br />

Class 7 cands. Class 8 cands. N? Equivalent(s)<br />

selo dilo + thing, things<br />

setšhaba ditšhaba + nation, nations<br />

sello dillo + (out)cry, outcries<br />

sepetše *dipetše — walked<br />

sekelela dikelela — recommend, disappear<br />

The checking tool is robust towards inexistent forms (cf. *dipetše) and towards<br />

forms that are not nominal (due to the context constraint on agreement-bearing<br />

items, (cf. sekelela versus dikelela).<br />

A first qualitative evaluation of the noun guessing routines on all candidates from<br />

the 43,000 word corpus sample seems to suggest that the tool only fails on lexicalized<br />

irregular forms (e.g. mong - beng, ‘owner(s)’, instead of the hypothetical mong -<br />

*bang), and on nouns that, mostly due to semantic reasons, do not have both a singular<br />

and a plural form (such as Sepedi ‘Pedi language and culture’, or leboa ‘North’). As<br />

for the verb guesser, the noun guesser can be and has to be applied (for quantitative<br />

reasons) to any new corpus to be annotated.<br />

3.6 Rules for the Disambiguation of Closed Class Items<br />

Given the high degree of ambiguity in closed class items (see section 2.3), there is<br />

a major need for disambiguation strategies for these items. Even though a statistical<br />

tagger is designed for this type of disambiguation, a rule-based preprocessing, leading<br />

at least to a partial reduction of ambiguity, seems necessary.<br />

We use context-based disambiguation rules, in the spirit of Gross and Silberztein’s<br />

local grammars (Silberztein 1993) and of rule-based tagging. As with the noun guessing<br />

queries, disambiguation rules are implemented as queries in the format of the CQP<br />

language. Some extraction rules exclusively rely on lexical contexts (cf. the topmost<br />

part of Table 9), while others involve lexemes and word class tagged items (middle<br />

row), or a combination of lexical, categorical and morphological constraints (including,<br />

for example, the presence of certain affixes [cf. lower part of Table 9]). The examples<br />

in Table 9 all relate to the disambiguation of the form a, the most frequent and most<br />

ambiguous item in our sample (cf. Table 5).<br />

109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!