10.01.2014 Views

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

NOTE<br />

<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> <strong>fallacy*</strong><br />

ANATOL STEFANOWITSCH<br />

Introduction<br />

There is little that is more completely accepted in <strong>the</strong> conventional wisdom<br />

of modern linguistics than <strong>the</strong> assumption that corpora do not<br />

contain negative <strong>evidence</strong> <strong>and</strong> that, <strong>the</strong>refore, intuition-based acceptability<br />

judgments are an indispensable part of linguistic methodology.<br />

This assumption goes back at least to Chomsky’s discussion of grammaticality<br />

in Syntactic Structures (Chomsky 1957: 15 ff.), whose claims<br />

can perhaps be excused on <strong>the</strong> basis that he was writing before <strong>the</strong> advent<br />

of modern corpus linguistics. More worrying, however, is that many<br />

modern corpus linguists still share this assumption.<br />

For example, in what is o<strong>the</strong>rwise one of <strong>the</strong> most thorough <strong>and</strong> most<br />

thoughtful textbooks on corpus linguistics currently available, McEnery<br />

<strong>and</strong> Wilson (2001: 11) ask:<br />

Without recourse to introspective judgments, how can ungrammatical<br />

utterances be distinguished from ones that simply haven’t occurred<br />

yet? If our finite corpus does not contain <strong>the</strong> sentence:<br />

*He shines Tony books.<br />

how do we conclude that it is ungrammatical?<br />

And without any discussion of potential alternatives <strong>the</strong>y promptly<br />

give <strong>the</strong> following answer (McEnery <strong>and</strong> Wilson 2001: 12):<br />

It is only by asking a native or expert speaker of a language for <strong>the</strong>ir<br />

opinion of <strong>the</strong> grammaticality of a sentence that we can hope to differentiate<br />

unseen but grammatical constructions from those which are<br />

simply grammatical but unseen.<br />

They conclude <strong>the</strong>ir discussion by stating that “we [corpus linguists]<br />

must not eschew introspection entirely. If we do, detecting ungrammatical<br />

structures <strong>and</strong> ambiguous structures becomes difficult <strong>and</strong>, indeed,<br />

may be impossible.” (McEnery <strong>and</strong> Wilson 2001: 12).<br />

Corpus Linguistics <strong>and</strong> Linguistic Theory 21 (2006), 6177<br />

DOI 10.1515/CLLT.2006.003<br />

1613-7027/06/00020061<br />

Walter de Gruyter


62 A. Stefanowitsch<br />

In this note, I would like to take issue with (a large part of) <strong>the</strong>ir<br />

argument. I will argue that <strong>the</strong> idea that corpora do not contain negative<br />

<strong>evidence</strong> is simply a special case of what I have termed <strong>the</strong> observed<strong>frequency</strong><br />

(or <strong>raw</strong>-<strong>frequency</strong>) fallacy, i. e., <strong>the</strong> belief that “[o]bserved frequencies<br />

of occurrence represent relevant facts for scientific analysis”<br />

(Stefanowitsch 2005: 296). When approached with <strong>the</strong> right methodological<br />

tools, corpora do provide negative <strong>evidence</strong>, i. e., <strong>evidence</strong> that<br />

allows us, in principle, to distinguish between constructions that did not<br />

occur but could have (<strong>the</strong>se could be referred to as ‘accidentally absent’,<br />

<strong>and</strong> constructions that did not occur <strong>and</strong> could not have (<strong>the</strong>se can be<br />

referred to as ‘significantly absent’ structures). Thus, while I do agree<br />

that linguists cannot (<strong>and</strong> should not) ‘eschew introspection entirely’, I<br />

will argue that <strong>the</strong>y can (<strong>and</strong> largely should) eschew introspective judgments<br />

of acceptability.<br />

Collostructional analysis <strong>and</strong> <strong>the</strong> significance of absence<br />

In this section, I will address <strong>the</strong> general issue of how significant absences<br />

of a particular configuration of linguistic elements can be distinguished<br />

from accidental ones, using as an example <strong>the</strong> ‘ability’ or ‘inability’ of<br />

English verbs to occur with ditransitive complementation. The choice of<br />

this example is motivated primarily by practical considerations: as will<br />

presently become clear, <strong>the</strong> method I will use requires <strong>the</strong> researcher to<br />

extract exhaustively from a corpus all occurrences of <strong>the</strong> grammatical<br />

phenomenon in question. Ditransitive complementation happens to be<br />

one of <strong>the</strong> features that is relatively uncontroversially tagged in <strong>the</strong><br />

largest grammatically annotated balanced corpus currently available, <strong>the</strong><br />

British component of <strong>the</strong> International Corpus of English (ICE-GB, cf.<br />

Nelson et al. 2002). However, it is a welcome coincidence that this is<br />

precisely <strong>the</strong> complementation pattern that McEnery <strong>and</strong> Wilson chose<br />

to demonstrate <strong>the</strong> need for grammaticality judgments. 1<br />

The relevant method is one of several that Gries <strong>and</strong> I have developed<br />

in a series of publications specifically for <strong>the</strong> purpose of investigating<br />

<strong>the</strong> relationship between grammatical constructions <strong>and</strong> <strong>the</strong> words occurring<br />

in <strong>the</strong>m, <strong>and</strong> that we refer to collectively as collostructional<br />

analysis (cf. e. g., Stefanowitsch <strong>and</strong> Gries 2003, 2005, to appear a; Gries<br />

<strong>and</strong> Stefanowitsch 2004a, b, to appear). 2 The most basic of <strong>the</strong>se methods,<br />

simple collexeme analysis, allows <strong>the</strong> researcher to identify words<br />

that occur significantly more or less frequently than expected in a given<br />

slot of a construction. This is done on <strong>the</strong> basis of a st<strong>and</strong>ard 2-by-2<br />

contingency table containing four observed frequencies: (a) <strong>the</strong> <strong>frequency</strong><br />

of a given word in a particular slot of a given construction, (b)<br />

<strong>the</strong> <strong>frequency</strong> of <strong>the</strong> same word in <strong>the</strong> corresponding slots of all o<strong>the</strong>r


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 63<br />

Table 1.<br />

Give with ditransitive complementation in <strong>the</strong> ICE-GB<br />

Ditransitive ÿDitransitive Total<br />

give 560 531 1,091<br />

(14.57) (1076.43)<br />

ÿgive 1,264 134,196 135,460<br />

(1,809.43) (133,650.57)<br />

Total 1,824 134,727 136,551<br />

constructions, (c) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r words in <strong>the</strong> relevant slot of<br />

<strong>the</strong> construction under investigation, <strong>and</strong> (d) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r<br />

words in <strong>the</strong> corresponding slot of all o<strong>the</strong>r constructions. From <strong>the</strong>se<br />

frequencies, we can derive <strong>the</strong> expected <strong>frequency</strong> of occurrence of <strong>the</strong><br />

word in <strong>the</strong> construction, which allows us to determine whe<strong>the</strong>r <strong>and</strong> in<br />

what direction <strong>the</strong> observed <strong>frequency</strong> deviates from <strong>the</strong> expected <strong>frequency</strong><br />

<strong>and</strong> whe<strong>the</strong>r this deviation is statistically significant. As an example,<br />

consider Table 1, which shows <strong>the</strong> relevant contingency table for <strong>the</strong><br />

verb give <strong>and</strong> <strong>the</strong> ditransitive complementation pattern in <strong>the</strong> British<br />

Component of <strong>the</strong> International Corpus of English (ICE-GB) (expected<br />

frequencies are shown in paren<strong>the</strong>ses).<br />

As Table 1 shows, give occurs vastly more frequently than expected<br />

with ditransitive complementation; <strong>the</strong> Fisher-Yates exact test shows that<br />

this difference is highly significant (p < 4.94e324, <strong>the</strong> smallest number<br />

a typical current home-issue computer can h<strong>and</strong>le). In collostructional<br />

analysis, we usually take <strong>the</strong> p-value directly as a measure of association<br />

strength (cf. Pedersen 1996 <strong>and</strong> Stefanowitsch <strong>and</strong> Gries 2003: 238 f. for<br />

justification). In o<strong>the</strong>r words, <strong>the</strong> extremely small p-value is taken to be<br />

an indication of an extremely strong association between give <strong>and</strong> <strong>the</strong><br />

ditransitive complementation pattern.<br />

Repeating this procedure for all verbs occurring with ditransitive complementation<br />

in <strong>the</strong> ICE-GB allows us to rank all verbs first, by whe<strong>the</strong>r<br />

<strong>the</strong>y occur more or less frequently than expected, <strong>and</strong> second, by association<br />

strength. Words that occur more frequently than expected are referred<br />

to as attracted collexemes (<strong>the</strong> strength of <strong>the</strong>ir positive association<br />

can be referred to as attraction strength), words that occur less frequently<br />

are referred to as repelled collexemes (with a corresponding repulsion<br />

strength). For example, all verbs occurring significantly more frequently<br />

than expected are shown in Table 2. The significance level of 0.05<br />

was corrected for multiple testing using a simple Bonferroni correction<br />

(Bonferroni 1936) whereby <strong>the</strong> significance level is divided by <strong>the</strong><br />

number of tests. Since <strong>the</strong> ICE-GB contains 4,856 verb types, this gives<br />

us 0.05/4,856 1.03E05. 3


64 A. Stefanowitsch<br />

Table 2.<br />

Significantly attracted collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

give 1,091 560 14.57 0.00E000<br />

tell 792 493 10.58 0.00E000<br />

send 295 78 3.94 4.13E076<br />

ask 504 92 6.73 9.65E074<br />

show 628 84 8.39 5.15E056<br />

offer 196 54 2.62 3.73E054<br />

convince 32 23 0.43 1.70E036<br />

cost 65 23 0.87 9.04E027<br />

inform 55 20 0.73 9.57E024<br />

teach 92 23 1.23 7.94E023<br />

assure 19 13 0.25 1.04E020<br />

remind 41 16 0.55 7.25E020<br />

lend 31 12 0.41 3.48E015<br />

promise 43 12 0.57 3.26E013<br />

owe 25 9 0.33 2.24E011<br />

grant 26 9 0.35 3.38E011<br />

warn 38 10 0.51 5.94E011<br />

award 16 7 0.21 7.72E010<br />

persuade 33 8 0.44 1.03E008<br />

allow 326 20 4.35 2.59E008<br />

guarantee 27 7 0.36 5.27E008<br />

deny 51 8 0.68 3.82E007<br />

earn 56 8 0.75 8.03E007<br />

h<strong>and</strong> 16 5 0.21 1.63E006<br />

pay 395 18 5.28 8.66E006<br />

give back 4 3 0.05 9.42E006<br />

The list of verbs in this table could now serve as a basis for a variety<br />

of observations, for example about <strong>the</strong> meaning of <strong>the</strong> ditransitive complementation<br />

pattern. I will not pursue this issue here (but cf. Stefanowitsch<br />

<strong>and</strong> Gries 2003, Section 3.2.2). 4 Instead, let me point out two<br />

facts about <strong>the</strong> way that <strong>the</strong> label ‘ditransitive’ is applied in <strong>the</strong> ICE-GB.<br />

First, structures with nominal <strong>and</strong> with clausal direct objects are included<br />

under this label (i. e., uses like She told me that she wants to be<br />

free of lawyers <strong>and</strong> doctors [ICE GB s2a062 133] or I told him to drive<br />

<strong>the</strong> forklift truck [ICE GB s2a067 050] as well as <strong>the</strong> more obvious I’ve<br />

told you <strong>the</strong> truth [ICE GB w2 f.006 213]). Second, some verbs are<br />

tagged as ditransitive whose second object might be better analyzed as<br />

an oblique argument, e. g., cost, asinIt cost <strong>the</strong>m three quid [ICE-GB<br />

s1a007 054]). 5 In o<strong>the</strong>r words, <strong>the</strong> label is applied ra<strong>the</strong>r generously.<br />

Next, consider Table 3, which shows <strong>the</strong> significantly repelled collexemes,<br />

sorted by repulsion strength (only <strong>the</strong> first two are significant at<br />

corrected levels).


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 65<br />

Table 3.<br />

Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

make 1,865 3 24.91 3.39E008<br />

do 2,937 12 39.23 2.56E007<br />

find 854 2 11.41 7.96E004<br />

call 616 1 8.23 2.32E003<br />

keep 374 1 5 3.95E002<br />

One could now ask why verbs that occur in a given construction might<br />

do so less frequently than expected. There are several reasons, some of<br />

<strong>the</strong>m more interesting than o<strong>the</strong>rs. First, verbs may appear on this list<br />

because <strong>the</strong>y are incorrectly tagged (in this case, call, which is tagged as<br />

‘ditransitive’ in <strong>the</strong> utterance And <strong>the</strong> person who’s being called [ICE-<br />

GB s1a030 003]). Obviously, such incorrect tags are hard to eliminate<br />

completely once a corpus reaches a certain size. Second, some verbs<br />

appear on this list because <strong>the</strong>ir ditransitive uses are very restricted, in<br />

some cases to a single fixed expression (in this case, keep s. o. company).<br />

Finally, most verbs appear on this list because <strong>the</strong>y occur very frequently<br />

with o<strong>the</strong>r complementation patterns (this is most obvious for <strong>the</strong> high<strong>frequency</strong><br />

verbs make <strong>and</strong> do, but it is also true of find). What one can<br />

take away from a discussion of such cases is, first, that fixed expressions<br />

must be taken into account in any linguistic analysis, <strong>and</strong> second, that<br />

complementation patterns exhibit a certain amount of productivity, occurring<br />

at least occasionally with verbs whose dominant patterns are<br />

o<strong>the</strong>rs (both facts are unsurprising from <strong>the</strong> perspective of construction<br />

grammar, in which collostructional analysis first developed).<br />

However, <strong>the</strong> data in Table 3 do not speak directly to <strong>the</strong> issue of<br />

negative <strong>evidence</strong> yet: a fur<strong>the</strong>r step is necessary. In our previous work,<br />

we have referred as ‘repelled’ only to those words which do occur in a<br />

given construction but do so less frequently than expected; however, as<br />

we noted in passing in our first paper (cf. Stefanowitsch <strong>and</strong> Gries 2003:<br />

238), it is possible <strong>and</strong> perhaps logical to include in this category<br />

words that would have been expected to occur in <strong>the</strong> construction based<br />

on <strong>the</strong>ir overall <strong>frequency</strong> in <strong>the</strong> corpus, but did not, in fact, occur in<br />

<strong>the</strong> construction at all. This is <strong>the</strong> step that finally takes us to <strong>the</strong> issue<br />

of negative <strong>evidence</strong>: The range of frequencies of occurrence that can be<br />

evaluated for statistical significance include <strong>the</strong> limiting case of zero; in<br />

o<strong>the</strong>r words, <strong>the</strong> non-occurrence of a particular configuration of linguistic<br />

categories (for example, of a particular verb in a particular construction)<br />

can be compared to its expected <strong>frequency</strong> of occurrence. This will


66 A. Stefanowitsch<br />

Table 4.<br />

a. say<br />

Three verbs that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB<br />

Ditransitive ÿDitransitive Total<br />

b. explain<br />

say 0 3,333 3,333<br />

(44.52)<br />

ÿsay 1,824 131,394 133,218<br />

Total 1,824 134,727 136,551<br />

Ditransitive ÿDitransitive Total<br />

explain 0 172 172<br />

(2.30)<br />

ÿexplain 1,824 134,555 136,379<br />

c. whisper<br />

Total 1,824 134,727 136,551<br />

Ditransitive ÿDitransitive Total<br />

whisper 0 5 5<br />

(0.07)<br />

ÿwhisper 1,824 134,722 136,546<br />

Total 1,824 134,727 136,551<br />

allow us, in many cases, to determine whe<strong>the</strong>r an unseen construction is<br />

likely to be a possible construction of a language or not.<br />

Consider Table 4, which shows <strong>the</strong> contingency tables for three verbs<br />

that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB, say,<br />

explain, <strong>and</strong> whisper.<br />

On a priori grounds, we might expect all three verbs to allow ditransitive<br />

complementation, since <strong>the</strong>y are all reasonably close in meaning to<br />

one of <strong>the</strong> most strongly attracted collexemes of this pattern, tell (<strong>and</strong><br />

o<strong>the</strong>r verbs of communication occurring among <strong>the</strong> significantly<br />

attracted collexemes; e. g. ask, inform, teach, assure). On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>,<br />

<strong>the</strong>y are textbook cases in <strong>the</strong> linguistic literature of verbs not allowing<br />

ditransitive complementation (cf. e. g., Pinker 1989).<br />

Table 4a provides conclusive <strong>evidence</strong> that <strong>the</strong> linguistic literature is<br />

right in <strong>the</strong> case of say, whose repulsion strength meets <strong>the</strong> corrected<br />

level of significance (p 1.96E20; < 1.03E05). We can confidently


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 67<br />

claim that <strong>the</strong> combination [say ditransitive] is significantly absent. In<br />

<strong>the</strong> case of explain, <strong>the</strong> repulsion strength does not meet even <strong>the</strong> uncorrected<br />

level (p 0.099; > 0.05), although it is not too far off. It is simply<br />

not frequent enough in <strong>the</strong> ICE-GB to let us determine whe<strong>the</strong>r its nonoccurrence<br />

is accidental or significant, although its marginal statistical<br />

significance may lead us to suspect <strong>the</strong> latter. No such suspicion would<br />

be warranted in <strong>the</strong> case of whisper, whose non-occurrence is well within<br />

<strong>the</strong> range of accidental variation (p 0.935; > 0.05).<br />

Before discussing <strong>the</strong>se issues any fur<strong>the</strong>r, let us take a look at <strong>the</strong><br />

results we get when we apply simple collexeme analysis exploratively to<br />

all verbs that occur in <strong>the</strong> ICE-GB but not in <strong>the</strong> ditransitive. There are<br />

4,856 verb types in <strong>the</strong> ICE-GB (according to my definition, which lists<br />

phrasal verbs as separate types, see Footnote 4). Of <strong>the</strong>se, 4,782 do not<br />

occur in <strong>the</strong> ditransitive. In turn, this non-occurrence is significant only<br />

for 53 verbs (of which only 11 meet <strong>the</strong> corrected level of significance).<br />

Table 5 shows <strong>the</strong> significantly repelled collexemes.<br />

Table 5.<br />

Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

be 25,416 0 340.00 4.29E165<br />

be|have 6,261 0 83.63 3.66E038<br />

have 4,303 0 57.48 2.90E026<br />

think 3,335 0 44.55 1.90E020<br />

say 3,333 0 44.52 1.96E020<br />

know 2,120 0 28.32 3.32E013<br />

see 1,971 0 26.33 2.54E012<br />

go 1,900 0 25.38 6.69E012<br />

want 1,256 0 16.78 4.27E008<br />

use 1,222 0 16.32 6.77E008<br />

come 1,140 0 15.23 2.06E007<br />

look 1,099 0 14.68 3.59E007<br />

Significant at uncorrected significance levels:<br />

try 749 0 10.00 4.11E005<br />

mean 669 0 8.94 1.21E004<br />

work 646 0 8.63 1.65E004<br />

like 600 0 8.01 3.08E004<br />

feel 593 0 7.92 3.38E004<br />

become 577 0 7.71 4.20E004<br />

happen 523 0 6.99 8.70E004<br />

put 513 0 6.85 9.96E004<br />

talk 490 0 6.55 1.36E003<br />

hear 483 0 6.45 1.49E003<br />

need 420 0 5.61 3.49E003<br />

believe 397 0 5.30 4.76E003


68 A. Stefanowitsch<br />

Table 5. (continued)<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

provide 380 0 5.08 5.99E003<br />

live 378 0 5.05 6.16E003<br />

remember 373 0 4.98 6.59E003<br />

produce 328 0 4.38 1.21E002<br />

speak 323 0 4.31 1.29E002<br />

hope 316 0 4.22 1.42E002<br />

run 309 0 4.13 1.56E002<br />

change 306 0 4.09 1.63E002<br />

meet 303 0 4.05 1.69E002<br />

help 301 0 4.02 1.74E002<br />

start 294 0 3.93 1.91E002<br />

move 291 0 3.89 1.99E002<br />

seem 285 0 3.81 2.16E002<br />

agree 279 0 3.73 2.34E002<br />

lead 271 0 3.62 2.60E002<br />

expect 265 0 3.54 2.82E002<br />

consider 264 0 3.53 2.86E002<br />

suggest 259 0 3.46 3.06E002<br />

describe 259 0 3.46 3.06E002<br />

decide 259 0 3.46 3.06E002<br />

underst<strong>and</strong> 250 0 3.34 3.46E002<br />

hold 249 0 3.33 3.50E002<br />

require 244 0 3.26 3.75E002<br />

involve 242 0 3.23 3.85E002<br />

suppose 241 0 3.22 3.90E002<br />

include 236 0 3.15 4.17E002<br />

occur 233 0 3.11 4.35E002<br />

develop 233 0 3.11 4.35E002<br />

go on 231 0 3.09 4.46E002<br />

follow 227 0 3.03 4.71E002<br />

Two things about this table require discussion. First, it demonstrates<br />

that even a one-million-word corpus is too small to allow us to identify<br />

significant absences for more than a h<strong>and</strong>ful of cases (at least for a<br />

relatively rare pattern such as ditransitive complementation). I will discuss<br />

this problem in <strong>the</strong> remainder of this section <strong>and</strong> in <strong>the</strong> next section.<br />

Second, <strong>the</strong> results only tell us that a particular structure is significantly<br />

absent, <strong>the</strong>y do not, as pointed out in <strong>the</strong> introduction, tell us why it is<br />

significantly absent. I will return to this problem in <strong>the</strong> final section.<br />

The problem of insufficient corpus size can ultimately only be solved<br />

by <strong>the</strong> creation of larger grammatically annotated corpora. However, in<br />

many individual cases it is possible to arrive at a fairly safe conclusion<br />

using currently available non-annotated corpora. Take <strong>the</strong> case of ex-


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 69<br />

plain. In <strong>the</strong> 100-million-word British National Corpus (<strong>the</strong> largest balanced<br />

corpus of British English currently available), <strong>the</strong> verb explain<br />

occurs 18,334 times, but not once with ditransitive complementation. In<br />

Stefanowitsch <strong>and</strong> Gries (2003: 219), we estimated that <strong>the</strong> BNC contains<br />

10,206,300 complementation patterns overall. If we assume that<br />

<strong>the</strong> proportion of ditransitives in <strong>the</strong> BNC is <strong>the</strong> same as in <strong>the</strong> ICE-<br />

GB, <strong>the</strong>n <strong>the</strong> BNC should contain 136,332 ditransitives. Given <strong>the</strong>se<br />

figures, we can now calculate <strong>the</strong> expected <strong>frequency</strong> of occurrence of<br />

explain in <strong>the</strong> ditransitive: 245. The difference between this <strong>and</strong> <strong>the</strong> observed<br />

<strong>frequency</strong> of zero is highly significant (at uncorrected levels of<br />

significance; p 6.73E108; < 0.001). Thus, <strong>the</strong> combination [explain<br />

ditransitive] can be categorized as a significantly absent structure<br />

based on negative corpus <strong>evidence</strong>. This strategy works even with a low<strong>frequency</strong><br />

verb like whisper, which occurs only 2,976 times in <strong>the</strong> BNC,<br />

but, again, does not occur in <strong>the</strong> ditransitive. Under <strong>the</strong> assumptions<br />

just outlined, <strong>the</strong> expected <strong>frequency</strong> of [whisper ditransitive] is 40.<br />

Again, <strong>the</strong> difference is highly significant (at uncorrected levels;<br />

p 4.139019E18; < 0.001). 6 Repeating individual tests on a larger<br />

corpus will, of course, not invariably lead to <strong>the</strong> conclusion that a given<br />

structure is significantly absent. In many cases, <strong>the</strong> <strong>frequency</strong> of a verb<br />

will remain too low to yield significant results. For example, <strong>the</strong> verb<br />

oxidise, like whisper, occurs five times in <strong>the</strong> ICE-GB, never in <strong>the</strong> ditransitive.<br />

In <strong>the</strong> BNC, it occurs 99 times, also never in <strong>the</strong> ditransitive. The<br />

expected <strong>frequency</strong> in this case is 1 (1.32, to be precise), <strong>and</strong> <strong>the</strong> difference<br />

between this <strong>and</strong> <strong>the</strong> observed <strong>frequency</strong> of zero is still far too<br />

small to reach statistical significance (p 0.26; >0.05). In o<strong>the</strong>r cases,<br />

extending a search to a larger corpus will fail to replicate <strong>the</strong> zero occurrence<br />

in <strong>the</strong> smaller corpus. For example, <strong>the</strong> BNC contains one clear<br />

example of donate with ditransitive complementation (1a), <strong>and</strong> a second<br />

potential one (1b):<br />

(1) a. Saudi king donates Laura transplant money. The king of Saudi<br />

Arabia has donated a hundred <strong>and</strong> fifty thous<strong>and</strong> pounds to Laura<br />

Davies ... (K1N)<br />

b. ... if <strong>the</strong> villagers hadn’t so kindly donated her furnishings, she’d<br />

probably still be existing in empty rooms ... (H95). 7<br />

Faced with such examples, <strong>the</strong>re is no longer any reason to believe<br />

that [donate ditransitive] is a significantly absent structure.<br />

Thus, <strong>the</strong> methodological problem of insufficiently large corpora is<br />

not, in principle, an argument against replacing intuitive grammaticality<br />

judgments by negative corpus <strong>evidence</strong> (in practice it may be, a point<br />

which I will return to below). Instead, <strong>the</strong> preceding discussion shows


70 A. Stefanowitsch<br />

that negative corpus <strong>evidence</strong> can be adduced for a pet case of syntactic<br />

<strong>the</strong>orizing (note that among <strong>the</strong> significantly absent collexemes of <strong>the</strong><br />

ditransitive complementation pattern in Table 5, <strong>the</strong>re are a large<br />

number of famously non-ditransitive verbs, e. g., suggest, provide, say,<br />

describe, etc.).<br />

One may now argue that even if such negative corpus <strong>evidence</strong> can be<br />

obtained, it does not add any insights that could not also be arrived at<br />

by introspective acceptability judgments at best (e. g., for say, explain,<br />

whisper), it will confirm what we know from intuition anyway, in <strong>the</strong><br />

worst case it will never yield enough data to decide <strong>the</strong> issue (e. g., oxidise)<br />

or contradict generally agreed-upon acceptability judgments (e. g.,<br />

donate). There are two reasons why this argument is wrong: first, unlike<br />

acceptability judgments, negative corpus <strong>evidence</strong> meets <strong>the</strong> st<strong>and</strong>ards of<br />

scientific research. Second, it is only such corpus <strong>evidence</strong> that will allow<br />

us to make principled statements about what is <strong>and</strong> is not possible: if we<br />

hypo<strong>the</strong>size, based on acceptability judgments, that whisper ditransitive<br />

is impossible, a single counterexample can prove this wrong. While<br />

such counterexamples may not occur even in very large corpora, <strong>the</strong>y<br />

are still easy to come by in <strong>the</strong> age of <strong>the</strong> Internet. A web search quickly<br />

turns up counterexamples produced by native speakers of (British) English,<br />

both with clausal <strong>and</strong>, more crucially, with nominal objects:<br />

(2) a. ... when I first beheld you <strong>the</strong> instinct of Nature whispered me<br />

that we were in some degree related ... (Jane Austen, Love <strong>and</strong><br />

Friendship, Letter 11)<br />

b. She had not been allowed to ... to bury <strong>the</strong> two people she had<br />

loved most in <strong>the</strong> world ... to whisper <strong>the</strong>m a last goodbye. (Meg<br />

Hutchinson, Peppercorn Woman)<br />

Of course, <strong>the</strong>se examples can in turn be questioned; <strong>the</strong>y may reflect<br />

older stages of <strong>the</strong> language (Jane Austen wrote Love <strong>and</strong> Friendship<br />

around 1790), <strong>the</strong>y may reflect regional dialects (Meg Hutchinson lives<br />

in Staffordshire), etc. However, <strong>the</strong> fact remains that we have objective<br />

au<strong>the</strong>ntic examples pitched against subjective intuition. In contrast, <strong>the</strong><br />

negative corpus <strong>evidence</strong> obtained from <strong>the</strong> BNC gives us an objective<br />

basis for arguing that, even though utterances like (2) can occur, <strong>the</strong>y<br />

are very marginal. This allows us to uphold <strong>the</strong> useful generalization<br />

that communication verbs can be used with ditransitive syntax if <strong>the</strong>y<br />

refer to <strong>the</strong> type of message communicated, but not if <strong>the</strong>y refer to <strong>the</strong><br />

manner in which it is communicated (Pinker 1989), even though we have<br />

to reformulate it as a strong statistical tendency instead of a categorical<br />

constraint. The same holds for donate. Even if we generously admit both<br />

(1a <strong>and</strong> 1b) as counterexamples, we can still point out that donate would


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 71<br />

have been expected to occur 14 times (based on <strong>the</strong> assumptions above),<br />

<strong>and</strong> thus still constitutes a strongly repelled collexeme (p 0.0001;<br />


72 A. Stefanowitsch<br />

whe<strong>the</strong>r zero deviates significantly from this expected <strong>frequency</strong> (<strong>the</strong> sufficient<br />

condition for upholding <strong>the</strong> hypo<strong>the</strong>sis). This information will<br />

likely be more difficult to obtain or estimate than information about<br />

complementation patterns, but to do so is by no means impossible.<br />

Any hypo<strong>the</strong>sis about possible <strong>and</strong> impossible structures in language<br />

is ultimately a hypo<strong>the</strong>sis about <strong>the</strong> incompatibility of two (or more)<br />

linguistic categories. As long as <strong>the</strong>se categories can be operationalized<br />

in such a way that <strong>the</strong>y can be exhaustively annotated (or identified<br />

spontaneously) in a corpus of naturally occurring language, <strong>and</strong> as long<br />

as <strong>the</strong> corpus is large enough, this corpus can provide both positive <strong>and</strong><br />

negative <strong>evidence</strong>. The first condition should always be met: if a category<br />

cannot be operationalized for objective identification, it has no place in<br />

a linguistic <strong>the</strong>ory. The second condition is not currently met. There are<br />

several syntactically annotated corpora (for example, <strong>the</strong> Penn Treebank,<br />

Sampson’s Suzanne <strong>and</strong> Christine corpora, <strong>and</strong> <strong>the</strong> ICE-GB used in this<br />

note), but <strong>the</strong>y are ei<strong>the</strong>r too small for many research questions, or <strong>the</strong>ir<br />

annotation scheme is too coarse or too unreliable, or both. However,<br />

this cannot seriously be used as a defense of <strong>the</strong> introspective method.<br />

Instead, it must be used as an argument for <strong>the</strong> funding <strong>and</strong> <strong>the</strong> human<br />

resources necessary for <strong>the</strong> construction of large grammatically annotated<br />

corpora. A discipline can only get so far by thought experiments (if<br />

that is what acceptability judgments are). It begins to make substantial<br />

headway only when it faces up to <strong>the</strong> problem of data scarcity <strong>and</strong> solves<br />

it. Astronomers have built radio telescopes, physicists have built particle<br />

colliders, <strong>and</strong> geneticists have sequenced <strong>the</strong> human genome; linguists<br />

should be able to construct large, balanced, syntactically annotated corpus<br />

of at least <strong>the</strong> world’s major languages. But even until this goal is<br />

reached or, more likely, in case it is never reached corpora can yield<br />

both positive <strong>and</strong> negative <strong>evidence</strong> for <strong>the</strong> construction of linguistic<br />

<strong>the</strong>ories.<br />

Final remarks: <strong>the</strong> occurring <strong>and</strong> <strong>the</strong> non-occurring<br />

The main point of this note was to show that corpora contain negative<br />

<strong>evidence</strong> <strong>and</strong> that this negative corpus <strong>evidence</strong> can, <strong>and</strong> should, replace<br />

introspective acceptability judgments. It seems appropriate, however, to<br />

discuss <strong>the</strong> most important <strong>the</strong>oretical implications of such a step.<br />

First, from <strong>the</strong> perspective advocated here, <strong>the</strong> non-occurrence of a<br />

particular linguistic structure is merely <strong>the</strong> limiting case; it is not qualitatively<br />

different from very rare occurrences. This may seem to be a problem<br />

for an approach that argues for an absolute distinction between<br />

possible <strong>and</strong> impossible configurations of linguistic categories (for example,<br />

between grammatical <strong>and</strong> ungrammatical structures). This problem


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 73<br />

may be more apparent than real, however. The continuum between significantly<br />

rare <strong>and</strong> significantly absent structures is not fundamentally<br />

different from <strong>the</strong> continuum between various degrees of unacceptability<br />

that is regularly found for acceptability ratings. In both cases, <strong>the</strong> data<br />

must be viewed in light of one’s <strong>the</strong>ory of language in order to make<br />

sense of this continuum. Also, it may well be possible to identify a degree<br />

of improbability that is close enough to impossibility to be indistinguishable<br />

from it.<br />

Second, while <strong>the</strong> statistically significant absence (or rareness) of a<br />

particular configuration of grammatical categories can be taken as <strong>evidence</strong><br />

that this configuration is impossible (i. e., very improbable), it does<br />

not, in itself, provide any clues as to why this should be <strong>the</strong> case. Again,<br />

<strong>the</strong> same is true of introspective judgments. Chomsky pointed this out<br />

early on: “The notion ‘acceptable’ is not to be confused with ‘grammatical’.<br />

Acceptability belongs to <strong>the</strong> study of performance whereas grammaticalness<br />

belongs to <strong>the</strong> study of competence” (Chomsky 1965: 11). A<br />

linguistic structure may give rise to introspective judgments of unacceptability<br />

for a number of reasons, of which ungrammaticality (or, more,<br />

generally, failure to conform to general linguistic rules) is just one. What<br />

that reason is must be determined independently of <strong>the</strong> acceptability<br />

judgment. The same is true of significantly absent (or rare) structures:<br />

determining significant absence/rareness is just <strong>the</strong> first step of a linguistic<br />

analysis. The second step is to determine <strong>the</strong> reasons for <strong>the</strong> significant<br />

absence/rareness. This step can be much closer to traditional linguistic<br />

argumentation. First, it may involve <strong>the</strong> search for au<strong>the</strong>ntic counterexamples<br />

(as in <strong>the</strong> case of whisper above) in order to test <strong>the</strong> extent of<br />

this absence. This may uncover variation in <strong>the</strong> data (panchronic, regional,<br />

social, etc.) or particular contexts in which seemingly impossible<br />

structures become possible. Second, it may involve constructing examples<br />

in order to determine whe<strong>the</strong>r <strong>the</strong> significant absence is semantically<br />

determined. If <strong>the</strong> constructed examples are not interpretable, <strong>the</strong> absence<br />

may simply be due to semantic incompatibility. For example, no<br />

interpretation can be assigned to He knew her <strong>the</strong> answer or She saw<br />

him <strong>the</strong> light. If <strong>the</strong> constructed examples are interpretable, <strong>the</strong>ir absence<br />

cannot be due to semantic incompatibility but may instead have purely<br />

formal reasons. For example, He said her <strong>the</strong> answer or She put him <strong>the</strong><br />

book are straightforwardly interpretable (of course, <strong>the</strong>re may be more<br />

fine-grained semantic restrictions as <strong>the</strong> huge literature on ditransitives<br />

shows). In o<strong>the</strong>r words, while I argue against <strong>the</strong> use of acceptability<br />

judgments as a linguistic method, I do not argue against <strong>the</strong> use of interpretation.<br />

There is good reason for this distinction, which I am not <strong>the</strong><br />

first to point out: interpreting utterances is a natural human activity,<br />

judging <strong>the</strong>ir acceptability is not.


74 A. Stefanowitsch<br />

Third, while it is plausible to speak of different degrees of attraction<br />

or repulsion in <strong>the</strong> case of combinations that do occur, it is less clear<br />

whe<strong>the</strong>r it makes sense to speak of different degrees of absence, as <strong>the</strong><br />

ranking of significantly absent collexemes in Table 5 suggests. Methodologically,<br />

this ranking merely reflects <strong>the</strong> certainty with which we can<br />

say that a structure is impossible. One may (but need not) argue, though,<br />

that this certainty reflects <strong>the</strong> certainty of a native speaker, in which case<br />

<strong>the</strong> ‘degrees of absence’ do become relevant to <strong>the</strong>oretical considerations.<br />

Whe<strong>the</strong>r <strong>the</strong> predictions of such a view are borne out by empirical data<br />

remains to be seen.<br />

More generally, it seems to me that accepting <strong>the</strong> methodology I have<br />

argued for here may lead to a slight but pervasive reorientation of linguistic<br />

<strong>the</strong>ory. If we accept significant presence <strong>and</strong> significant absence<br />

(as well as significant <strong>frequency</strong> <strong>and</strong> rareness) as <strong>the</strong> primary facts that<br />

a linguistic <strong>the</strong>ory must explain, <strong>the</strong>n this <strong>the</strong>ory will have to be broader<br />

than most current <strong>the</strong>ories. Ra<strong>the</strong>r than focusing exclusively on grammaticality,<br />

such a <strong>the</strong>ory would have to uncover <strong>the</strong> whole range of<br />

causes for <strong>the</strong> presence <strong>and</strong> absence of linguistic structures <strong>and</strong> investigate<br />

all of <strong>the</strong>m with <strong>the</strong> same degree of rigor <strong>and</strong> explicitness. The aim<br />

of linguistic analysis would no longer be “to separate <strong>the</strong> grammatical<br />

sequences which are <strong>the</strong> sentences of [a language] L from <strong>the</strong> ungrammatical<br />

sequences which are not sentences of L <strong>and</strong> to study <strong>the</strong> structure<br />

of <strong>the</strong> grammatical sentences” (Chomsky 1957: 13). Instead, <strong>the</strong><br />

aim would be to provide for individual languages <strong>and</strong>, ultimately, for<br />

language in general a comprehensive <strong>the</strong>ory of <strong>the</strong> occurring <strong>and</strong> <strong>the</strong><br />

non-occurring. 8<br />

Received January 2006<br />

Revisions received March 2006<br />

Final acceptance March 2006<br />

University of Bremen<br />

Notes<br />

* I would like to thank Stefan Gries, Arne Zeschel <strong>and</strong> <strong>the</strong> participants of <strong>the</strong> 7.<br />

Norddeutsches Linguistisches Kolloquium for <strong>the</strong>ir comments on <strong>the</strong> ideas presented<br />

in this paper. Any conceptual errors are mine alone.<br />

1. Actually, <strong>the</strong>re are several potential reasons for <strong>the</strong> oddness of McEnery <strong>and</strong> Wilson’s<br />

example (for example, <strong>the</strong> use of <strong>the</strong> simple present <strong>and</strong> <strong>the</strong> potential violation<br />

of <strong>the</strong> selection restrictions of <strong>the</strong> verb shine by <strong>the</strong> direct object NP books).<br />

Their discussion suggests, however, that <strong>the</strong>y are concerned with complementation.<br />

2. An overview over this method <strong>and</strong> its place in <strong>the</strong> corpus-based study of grammatical<br />

patterns will be provided in Stefanowitsch <strong>and</strong> Gries (to appear b); meanwhile,<br />

an introduction can be found on my website at . This website also provides a number of Perl scripts for doing col-


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 75<br />

lostructional analysis (PerlClx 1.0); cf. also Gries’ R script CollAnalysis 3, available<br />

from his website at . Incidentally, both scripts can provide <strong>the</strong> corpus <strong>frequency</strong><br />

that a word not occurring in a particular construction must have in order<br />

for its absence to be significant given <strong>the</strong> <strong>frequency</strong> of <strong>the</strong> construction <strong>and</strong> <strong>the</strong><br />

size of <strong>the</strong> corpus. CollAnalysis provides this information as part of a collostructional<br />

analysis, PerlClx contains a script (zclx.pl) exclusively dedicated to this purpose.<br />

3. The Bonferroni correction is meant to place stricter requirements on statistical<br />

significance in situations where multiple tests are performed on <strong>the</strong> same data set:<br />

obviously, <strong>the</strong> more tests you perform, <strong>the</strong> more chances <strong>the</strong>re are for a seemingly<br />

significant result to come about by accident. However, some have argued (for<br />

example, Pernege 1998) that this correction does more harm than good because<br />

it removes many results that are significant. I will not place too much emphasis<br />

here on this correction. In a sense, whe<strong>the</strong>r one has to apply it in <strong>the</strong> context of<br />

collostructional analysis or not depends on one’s view of what one is doing: is<br />

one testing individual word-construction pairs (in which case each test can st<strong>and</strong><br />

on its own <strong>and</strong> one could be less concerned with correcting) or is one testing a<br />

construction <strong>and</strong> all words occurring in it (in which case one could be more concerned<br />

with correcting).<br />

4. These results differ from those that Gries <strong>and</strong> I have presented previously, mainly<br />

because we have focused exclusively on ditransitives with two nominal objects,<br />

whereas I have included here all uses tagged as ‘ditransitive’ in <strong>the</strong> ICE-GB, including<br />

those with a clausal direct object. Fur<strong>the</strong>rmore, here <strong>and</strong> throughout <strong>the</strong><br />

following discussion I have used regular expressions to search <strong>the</strong> corpus files<br />

directly, ra<strong>the</strong>r than using ICECUP, <strong>the</strong> software tool that accompanies <strong>the</strong> ICE-<br />

GB. I have also discarded all verbs marked as ‘ignored’ in <strong>the</strong> corpus annotation<br />

<strong>and</strong> I have discarded all unclear words. I have manually lemmatized <strong>the</strong> verbs<br />

<strong>and</strong> st<strong>and</strong>ardized spelling variants. Finally, I treat phrasal verbs as lemmas in<br />

<strong>the</strong>ir own right (cf. give back in Table 2); <strong>the</strong>y were identified by searching for<br />

verbs that were followed (<strong>and</strong> in some cases preceded) by a particle annotated<br />

as such.<br />

5. Note that cost does not behave like a typical transitive verb (whe<strong>the</strong>r in its monotransitive<br />

or its ditransitive use). For example, it cannot be passivized: *Three quid<br />

were cost (<strong>the</strong>m), *They were cost three quid. Thus, <strong>the</strong> apparent direct object may<br />

be better analyzed as an oblique (for example, a subject complement or an adjunct,<br />

cf. e. g., Quirk et al. 1985, § 16.27).<br />

6. Since we are conducting individual tests here based on hypo<strong>the</strong>ses about specific<br />

verbs, we could argue that <strong>the</strong> levels of significance do not have to be adjusted<br />

for multiple testing. However, since <strong>the</strong>re are thous<strong>and</strong>s of tests of <strong>the</strong> same kind<br />

(verb ditransitive) that we could have performed, it might be a good idea to<br />

correct for multiple testing anyway. According to Leech et al.’s (2001) <strong>frequency</strong><br />

list, <strong>the</strong>re are 38,019 verb types in <strong>the</strong> BNC; if anything, this is an overestimation<br />

(since inaccurately tokenized forms like ["see] [&mdash;see] [see?] etc. are all<br />

counted as <strong>the</strong>ir own lemmas), so if we correct on this basis, we are on <strong>the</strong> safe<br />

side. The corrected level of significance is 1.32E06; both explain <strong>and</strong> whisper<br />

clear this level by several orders of magnitude.<br />

7. Due to <strong>the</strong> ambiguity of <strong>the</strong> pronominal form her, it is not clear whe<strong>the</strong>r this<br />

example is monotransitive (They donated [NP her furnishings]) or ditransitive<br />

(They donated [NP her] [NP furnishings]). However, a web search turns up additional<br />

clear (if rare) examples of ditransitive uses of donate, for example, In May<br />

2004, Cycle Heaven, a local retailer, <strong>and</strong> City of York Council threw down <strong>the</strong>


76 A. Stefanowitsch<br />

gauntlet to local schools, saying ‘achieve your target increases in walking <strong>and</strong> cycling<br />

by summer 2005, <strong>and</strong> we will donate you a free, high quality children’s bike! (http://<br />

www.york.gov.uk/cgi-bin/wn_document.pl?type 5927).<br />

8. I do not want to conclude this note without applying <strong>the</strong> by now familiar reasoning<br />

to McEnery <strong>and</strong> Wilson’s question how <strong>the</strong> ungrammaticality of *He shines<br />

Tony books could be determined without intuition judgments. Shine (in all its<br />

senses) occurs 2,258 times in <strong>the</strong> BNC. On <strong>the</strong> assumptions made above, <strong>the</strong> expected<br />

<strong>frequency</strong> of shine with ditransitive complementation would be 30; <strong>the</strong><br />

observed <strong>frequency</strong> is zero. This difference is highly significant (p 6.47E14)<br />

even if we correct for multiple testing. Thus, without resorting to introspection,<br />

we have proved that [shine ditransitive] is significantly absent. Whe<strong>the</strong>r this is<br />

strictly due to ungrammaticality is doubtful: first, McEnery <strong>and</strong> Wilson’s sentence<br />

is interpretable (albeit weird); second, it is possible to find au<strong>the</strong>ntic ditransitive<br />

uses by certified native speakers of English for both senses of shine: (i) Shine me<br />

a light from your eyes dear (Christine McVie, Show me a Smile [performed by<br />

Fleetwood Mac]); (ii) He smiles telling him to shine him a metallic Purple armor<br />

(Jimi Hendrix, Bold as Love). Thus, we could hypothsize that ditransitive uses of<br />

shine are semantically so restricted that <strong>the</strong>y occur only in very specific circumstances<br />

(e. g., <strong>the</strong> ‘light’ reading can only occur ditransitively when <strong>the</strong> direct object<br />

is light) or that <strong>the</strong>y only occur in certain dialects (e. g., Lancashire [McVie]<br />

<strong>and</strong> Seattle [Hendrix]) or registers (e. g., rock lyrics).<br />

References<br />

Bonferroni Carlo E.<br />

1936 Teoria statistica delle classi e calcolo delle probabilità . Pubblicazioni del<br />

R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8,<br />

362.<br />

Chomsky, Noam<br />

1957 Syntactic Structures. The Hague: Mouton.<br />

1965 Aspects of <strong>the</strong> Theory of Syntax. Cambridge, MA: MIT Press.<br />

Gries, Stefan Th. <strong>and</strong> Anatol Stefanowitsch<br />

2004a Extending collostructional analysis: a corpus-based perspective on ‘alternations’.<br />

International Journal of Corpus Linguistics 9(1), 97129.<br />

2004b Co-varying collexemes in <strong>the</strong> into-causative. In: Achard, Michel <strong>and</strong> Suzanne<br />

Kemmer (eds.), Language, Culture, <strong>and</strong> Mind. Stanford: CSLI,<br />

225236.<br />

To appear Cluster analysis <strong>and</strong> <strong>the</strong> identification of collexeme classes. In John Newman<br />

<strong>and</strong> Sally Rice (eds.), Empirical <strong>and</strong> Experimental Methods in Cognitive/Functional<br />

Research. Stanford: CSLI.<br />

Leech, Geoffrey, Paul Rayson, <strong>and</strong> Andrew Wilson<br />

2001 Word Frequencies in Written <strong>and</strong> Spoken English: Based on <strong>the</strong> British<br />

National Corpus. London: Longman.<br />

McEnery, Tony, <strong>and</strong> Andrew Wilson<br />

2001 Corpus Linguistics. An Introduction. Second edition. Edinburgh: Edinburgh<br />

University Press.<br />

Nelson, Gerald, Sean Wallis <strong>and</strong> Bas Aarts (eds.)<br />

2002 Exploring Natural Language: Working with <strong>the</strong> British Component of <strong>the</strong><br />

International Corpus of English. Amsterdam <strong>and</strong> Philadelphia: John Benjamins.


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 77<br />

Pedersen, Ted<br />

1996 Fishing for exactness. Proceedings of <strong>the</strong> South Central SAS User’s Group<br />

Conference, Austin, TX, 188200.<br />

Pernege, Thomas V<br />

1998 What’s wrong with Bonferroni adjustments. British Medical Journal 316,<br />

12361238.<br />

Pinker, Steven<br />

1989 Learnability <strong>and</strong> Cognition. The Acquisition of Argument Structure. Cambridge,<br />

MA: MIT Press.<br />

Quirk, R<strong>and</strong>olph, Sidney Greenbaum, Geoffrey Leech, <strong>and</strong> Jan Svartvik<br />

1985 A Comprehensive Grammar of <strong>the</strong> English Language. London: Longman.<br />

Stefanowitsch, Anatol<br />

2005 New York, Dayton (Ohio), <strong>and</strong> <strong>the</strong> Raw Frequency Fallacy. Corpus Linguistics<br />

<strong>and</strong> Linguistic Theory 1(2), 295301.<br />

Stefanowitsch, Anatol <strong>and</strong> Stefan Th. Gries<br />

2003 Collostructions: investigating <strong>the</strong> interaction of words <strong>and</strong> constructions.<br />

International Journal of Corpus Linguistics 8(2), 209243.<br />

2005 Covarying Collexemes. Corpus Linguistics <strong>and</strong> Linguistic Theory 1(1),<br />

143.<br />

To appear a Channel <strong>and</strong> constructional meaning: A collostructional case study. In:<br />

Kristiansen, Gitte <strong>and</strong> René Dirven (eds.), Cognitive Sociolinguistics:<br />

Language Variation, Cultural Models, Social Systems. Berlin <strong>and</strong> New<br />

York: Mouton de Gruyter.<br />

To appear b Corpora <strong>and</strong> Grammar. In: Anke Lüdeling, Meria Kytö, <strong>and</strong> Tony<br />

McEnery. Corpus Linguistics (H<strong>and</strong>books of Linguistics <strong>and</strong> Communication<br />

Science/HSK). Berlin: Mouton de Gruyter.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!