29.01.2014 Views

GWC 2008

GWC 2008

GWC 2008

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Verification of Valency Frame Structures by Means of Automatic… 41<br />

It is possible to assess the clustering quality so that to choose the optimal window<br />

width (and other parameters). We consider clustering to be interpretable if verbs from<br />

the same semantic tree are put together in one cluster, and inexplicable – if there are a<br />

lot of representatives from different semantic groups mixed in one cluster. If there<br />

were not morphological correlations among contexts of different verbs with similar or<br />

relative meanings, we’d never received any explicable groupings.<br />

Results of [4] show that a range of positions in the window has a cumulative<br />

outcome. So we compared verb distributions per each position and per position range<br />

and received expected result: there were no separate position in the window [-<br />

10…+10], which was sufficient by itself to show reliable clustering. On Fig. 3, 4, 5<br />

there are stemmas for clustering in the [-10] position, [+1] position and the better<br />

range [-3,+5]. Grey background marks verbs from the same semantic group, numbers<br />

at the top nodes show the step of clustering and an average similarity<br />

5 An Optimal Tag Set for Distribution Capture<br />

The tag set (TS) is another key point of distribution description. The first variant of<br />

the tag set showed interpretable results, however, it was significant to what extent it<br />

may deviate clustering. We tried 3 tag sets: the 1st TS was described above, the 2nd<br />

TS was simple POS tagging without specification of grammar category values (e.g. N,<br />

A, Adv, V, Pron, etc.); the 3rd TS was a kind of POS tag generalisation: all<br />

substantives were united under one tag, however, case specification for substantives<br />

were added (Nom, Gen, Dat, Acc, Abl, Loc, V, Adv, etc.). The comparison of<br />

clustering parameters shows that clustering with the 2nd TS produces a flatter<br />

structure without elaboration of the inner structure in groups, the clustering with the<br />

3rd TS creates more detailed structure than the 1st TS.<br />

Fig. 5. The stemma of verb clustering in the [-3,+5] position range<br />

of the distribution window with the 1st TS

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!