29.01.2014 Views

GWC 2008

GWC 2008

GWC 2008

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Verification of Valency Frame Structures by Means of Automatic… 39<br />

part of the context (e.g. conjunctions for communication verbs) or left context (e.g.<br />

prepositions for communication verbs). On Fig. 1 the correlated distribution of<br />

prepositions in contexts of verbs of movement is shown, on Fig. 2 – the uncorrelated<br />

distribution of adjectives for verbs of communication.<br />

Distribution data showed that it is feasible to use them for automatic differentiating<br />

of some groups of verbs, however, the width of the distribution analysis window and<br />

the appropriate tag set should be examined in detail.<br />

4 The Optimal Width of the Distribution Window<br />

The research reported in this paper involves 51 frequent Russian verbs from 21<br />

semantic groups taken from [9]. Each verb was represented by 200 contexts chosen at<br />

random from our corpus, which were unambiguously marked up with morphological<br />

tags. At first we chose the maximal window of [-10…+10] positions and the tag set<br />

including POS marker plus case specification for substantives and the aspect value for<br />

verbs (e.g. Nnom, Aloc, Vperf, etc). Punctuation marks are represented by one tag<br />

PM. Tag distributions are calculated in all positions of the maximum window for the<br />

contexts of each verb, thus each position was represented by the vector of tag<br />

frequencies. If some tag doesn’t occur in i-th position, its frequency is zero.<br />

Distributions were compared according to the vector model [10], [11] with the<br />

cosine similarity. For example, in i-th position in the window and distributions a and<br />

b the similarity is equal to:<br />

sim(<br />

a , b ) =<br />

i<br />

i<br />

∑<br />

∑<br />

N<br />

N<br />

a<br />

a<br />

ij<br />

2<br />

ij<br />

× b<br />

In Tab. 1 a fragment of the positional similarity matrix for verbs is shown,<br />

similarity is measured in per cents.<br />

×<br />

∑<br />

Table 1. A fragment of positional similarity matrix (%).<br />

Verb1 Verb2 all -10 -5 -2 -1 1 2 7 8<br />

брать 'to take' мочь 'to be able' 81 95 93 88 57 14 85 94 96<br />

хотеть 'to wish' 84 97 94 90 63 39 88 94 96<br />

идти 'to go' 88 96 89 93 90 58 91 92 97<br />

иметь 'to have' 91 93 95 94 87 83 85 91 94<br />

казаться 'to appear' 82 93 84 89 86 45 65 92 89<br />

N<br />

ij<br />

b<br />

2<br />

ij<br />

(1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!