16.01.2014 Views

The morphological productivity of selected ... - Helda - Helsinki.fi

The morphological productivity of selected ... - Helda - Helsinki.fi

The morphological productivity of selected ... - Helda - Helsinki.fi

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

the BNC is the fact that since it does not cover the last twenty years or so, the<br />

most recent changes or trends in the English word-formation are not covered.<br />

However, in this study, the use <strong>of</strong> the BNC is justi<strong>fi</strong>ed, because it has already been<br />

used in many studies on the <strong>productivity</strong> <strong>of</strong> various af<strong>fi</strong>xes. This enables the<br />

comparison <strong>of</strong> the <strong>productivity</strong> rates to the ones calculated for the <strong>selected</strong><br />

combining forms. When necessary, and especially to provide supplementary<br />

information on the behaviour <strong>of</strong> these combining forms, the Corpus <strong>of</strong><br />

Contemporary American English (COCA) will also be consulted, especially in the<br />

discussion section.<br />

It must be noted, however, that a corpus-based <strong>productivity</strong> measure<br />

is not devoid <strong>of</strong> problems, either. Lüdeling et al. criticize Baayen for assuming<br />

that all electronic corpora are automatically perfectly prepared (2000: 59). <strong>The</strong>re<br />

are, however, at least three corpus pre-processing problems that must be taken<br />

into consideration: mistagged items, typographical errors, and repetition <strong>of</strong> the<br />

same texts in corpus composition. Similar problems are even more common in<br />

historical data, since historical corpora <strong>of</strong>ten contain several spelling variants <strong>of</strong><br />

the same af<strong>fi</strong>xes (see Säily and Suomela 2009: 93). Tagging is not a problem in<br />

the present study, since we are only interested in the internal make-up <strong>of</strong> lexemes.<br />

Even though the BNC has been claimed to have some inherent structural<br />

problems, such as wrongly classi<strong>fi</strong>ed texts and overly broad categories (see Lee<br />

2001), the sampling has been done carefully (see Burnard 2007: section 1.4.4 for<br />

the selection procedures employed), so that the repetition <strong>of</strong> texts is highly<br />

unlikely.<br />

Typographical errors are, however, a problem that must be taken<br />

seriously. Furthermore, errors <strong>of</strong>ten occur only once and consequently add to the<br />

number <strong>of</strong> hapaxes, and may therefore considerably affect the results (Lüdeling et<br />

al. 2000: 59). Plag expresses the same kind <strong>of</strong> concern, adding that some<br />

qualitative decisions are necessary even in purely quantitative analysis (1999: 29).<br />

Thus, in order to properly analyse the data for the present study, a certain amount<br />

<strong>of</strong> manual sorting was necessary.<br />

In general, corpora constitute a better source for the purposes <strong>of</strong> the<br />

present study. However, as has been shown, corpora as a source are far from<br />

unproblematic. Even the largest, most balanced and representative corpus will<br />

never equal real language use, even though it may provide a useful tool to make<br />

44

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!