The morphological productivity of selected ... - Helda - Helsinki.fi
The morphological productivity of selected ... - Helda - Helsinki.fi
The morphological productivity of selected ... - Helda - Helsinki.fi
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>fi</strong>gure. 14 A sample is picked randomly, the number <strong>of</strong> types or hapaxes is<br />
calculated, and the sample is plotted on a <strong>fi</strong>gure (see Figure 1). Another (random)<br />
sample is picked and added to the previous one, until the whole corpus has been<br />
sampled. <strong>The</strong> procedure is then repeated, for example, a million times with the<br />
help <strong>of</strong> a suitable computer s<strong>of</strong>tware, and the permutation, i.e., the order <strong>of</strong> the<br />
samples changes every time. <strong>The</strong> result looks something like Figure 2: it is<br />
possible to indicate visually the area covered by a certain percentage <strong>of</strong> the<br />
curves, which helps us to see the extent to which the <strong>productivity</strong> rates <strong>of</strong> a certain<br />
feature in a sub-corpus differ from those <strong>of</strong> the corpus as a whole (Säily 2011:<br />
125).<br />
According to Säily, type accumulation curves are a particularly good<br />
measure in comparing <strong>productivity</strong> between different (e.g., gender-speci<strong>fi</strong>c)<br />
subcorpora (2011: 127, 135). <strong>The</strong>re are two ways to plot a type accumulation<br />
curve: either the number <strong>of</strong> types as a function <strong>of</strong> corpus size, or the number <strong>of</strong><br />
hapaxes as a function <strong>of</strong> the number <strong>of</strong> types. Hapax accumulation curves,<br />
however, require a large corpus in order to produce statistically valuable data in<br />
which the results are not skewed (Säily 2011).<br />
14 <strong>The</strong> curves can be plotted either for types (as a function <strong>of</strong> token frequencies) or hapaxes (as a<br />
function <strong>of</strong> running words in the corpus) (see Säily 2011: 126).<br />
28