29.01.2014 Views

GWC 2008

GWC 2008

GWC 2008

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Verification of Valency Frame Structures by Means of Automatic… 37<br />

• morphological and syntactic features, i.e. frequent surface expression (e.g. the<br />

particular preposition for nouns, the aspect form for infinitive, an adverb from the<br />

particular group, etc.).<br />

The described structure is used for automatic disambiguation [3]: the parsing of the<br />

phrase or sentence is mapped against valency frames of words in the construction. In<br />

case of matching between a parse structure and obligatory valency frames, it is<br />

considered verified, otherwise, optional valency frames may show preferred analysis.<br />

The open question is whether inheritance of valency frame parameters in<br />

troponymic verbal trees exists, in what form and for what parameters. We had a<br />

preliminary research basing on three semantic verb trees in RussNet [8] and, though<br />

didn’t receive unequivocal confirmation of the inheritance scheme, it may be stated<br />

that context features of the major part of verbs from a particular semantic tree share a<br />

lot of statistically stable parameters.<br />

In order to prove this stability we investigated the automatic clustering of verb<br />

contexts as representatives of WMs in RussNet. This research is presented below.<br />

3 Motivation: How to Use a Morphological Marking up of the<br />

Text for Sense Disambiguation<br />

In our research we chose for a starting point the approach of [4]. In this work the<br />

disambiguation procedure for a verb serve was described, in which 3 distributions<br />

were used: main POS tags, additional markers (e.g. punctuation marks, prepositions,<br />

etc.), and lexical items. This investigation demonstrated that POS tags and some other<br />

features afford to reliably (80–83%) differentiate meanings of this polysemious verb.<br />

The results depend on the width of the analysis window, and the substantial amount of<br />

contexts in the training set.<br />

The authors drew conclusions comparing their approach with similar ones<br />

1. initial processing of the text (e.g. syntactically connected fragments) doesn’t affect<br />

the results crucially;<br />

2. it was unachievable to differentiate low frequency WMs, because it was hardly<br />

possible to compile quality training set;<br />

3. it was easier to differentiate homonyms (or contrast WMs) than similar WMs;<br />

4. the huge amount of a training set improves the results but not to the same extent as<br />

processing time increased for preparing these sets.<br />

3.1 Preliminary Results: the POS Tag Distribution for Different Semantic Verb<br />

Groups<br />

The mentioned research might be valid only for languages with a fixed word order,<br />

and hardly applicable for languages with freer word order. We decided for the<br />

beginning to fulfil “pilot” check: compare distributions of 9 verbs from two groups:<br />

verbs of movement (идти ‘to go’, пойти ‘to start walking’, выйти ‘to go out’,<br />

вернуться ‘to return’, ходить ‘to walk’) and communication verbs (сказать ‘to<br />

say’, говорить ‘to speak’, спросить ‘to ask’, ответить ‘to answer’, просить ‘to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!