16.01.2014 Views

The morphological productivity of selected ... - Helda - Helsinki.fi

The morphological productivity of selected ... - Helda - Helsinki.fi

The morphological productivity of selected ... - Helda - Helsinki.fi

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>fi</strong>gure. 14 A sample is picked randomly, the number <strong>of</strong> types or hapaxes is<br />

calculated, and the sample is plotted on a <strong>fi</strong>gure (see Figure 1). Another (random)<br />

sample is picked and added to the previous one, until the whole corpus has been<br />

sampled. <strong>The</strong> procedure is then repeated, for example, a million times with the<br />

help <strong>of</strong> a suitable computer s<strong>of</strong>tware, and the permutation, i.e., the order <strong>of</strong> the<br />

samples changes every time. <strong>The</strong> result looks something like Figure 2: it is<br />

possible to indicate visually the area covered by a certain percentage <strong>of</strong> the<br />

curves, which helps us to see the extent to which the <strong>productivity</strong> rates <strong>of</strong> a certain<br />

feature in a sub-corpus differ from those <strong>of</strong> the corpus as a whole (Säily 2011:<br />

125).<br />

According to Säily, type accumulation curves are a particularly good<br />

measure in comparing <strong>productivity</strong> between different (e.g., gender-speci<strong>fi</strong>c)<br />

subcorpora (2011: 127, 135). <strong>The</strong>re are two ways to plot a type accumulation<br />

curve: either the number <strong>of</strong> types as a function <strong>of</strong> corpus size, or the number <strong>of</strong><br />

hapaxes as a function <strong>of</strong> the number <strong>of</strong> types. Hapax accumulation curves,<br />

however, require a large corpus in order to produce statistically valuable data in<br />

which the results are not skewed (Säily 2011).<br />

14 <strong>The</strong> curves can be plotted either for types (as a function <strong>of</strong> token frequencies) or hapaxes (as a<br />

function <strong>of</strong> running words in the corpus) (see Säily 2011: 126).<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!