certain estimations or predictions based on statistics. As Baroni and Evert have

put it, “corpora are finite samples from the infinite sets that constitutes a

language” (2008: 777). These samples let us make generalizations and inferences,

but this requires that the problem at hand is accurately operationalized in

quantitative terms (Evert and Baroni 2008: 777–778). A good strategy in the study

of productivity is to combine corpus and dictionary evidence to provide a

comprehensive view of the productivity of different patterns (see Plag 2006).

3.3. How to determine units of analysis and other methodological problems

One has to be careful when making inferences based on corpus evidence.

Lüdeling et al. claim that even seemingly error-free corpora may cause trouble in

this respect: for example, words may accidentally contain strings of characters

that look like affixes but are not (2000: 59). For example, not all words that begin

with sub- actually contain the prefix sub- in contemporary English (e.g., subtle),

even if they might be derived from a word that was morphologically complex in

the donor language (Latin sub ‘under’ + tēla ‘web’ ) (Plag 2006: 543). In

psycholinguistics, this phenomenon is referred to as pseudo-affixation (see, e.g.,

Baayen 1993: 199). Another good example of pseudo-affixation is the status of

the constituent-fer in such words as infer, confer, prefer and transfer – it might be

analysed as a bound root, but it does not carry a meaning that would be the same

in all these words (see Plag 2003: 24–25). In the context of the present study,

examples of words that seem to consist of several elements but that actually are

mono-morphemic include hyperion and hyperoid. Therefore, they are excluded

from the type count.

Another similar issue is the fact that many affixes have adopted

idiosyncrasies that may blur their morphological make-up: for example, words

that end in -ity occur in many transparent, complex words, but also in words like

entity, which might not be considered complex (see also Plag 1999: 28). In a

generous count, this word would probably be counted as a unit of analysis and

included in the type count, but in that case it would skew the productivity rate of a

modern word-formation process.


