Negative evidence and the raw frequency fallacy* - CiteSeerX
Negative evidence and the raw frequency fallacy* - CiteSeerX
Negative evidence and the raw frequency fallacy* - CiteSeerX
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
NOTE<br />
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> <strong>fallacy*</strong><br />
ANATOL STEFANOWITSCH<br />
Introduction<br />
There is little that is more completely accepted in <strong>the</strong> conventional wisdom<br />
of modern linguistics than <strong>the</strong> assumption that corpora do not<br />
contain negative <strong>evidence</strong> <strong>and</strong> that, <strong>the</strong>refore, intuition-based acceptability<br />
judgments are an indispensable part of linguistic methodology.<br />
This assumption goes back at least to Chomsky’s discussion of grammaticality<br />
in Syntactic Structures (Chomsky 1957: 15 ff.), whose claims<br />
can perhaps be excused on <strong>the</strong> basis that he was writing before <strong>the</strong> advent<br />
of modern corpus linguistics. More worrying, however, is that many<br />
modern corpus linguists still share this assumption.<br />
For example, in what is o<strong>the</strong>rwise one of <strong>the</strong> most thorough <strong>and</strong> most<br />
thoughtful textbooks on corpus linguistics currently available, McEnery<br />
<strong>and</strong> Wilson (2001: 11) ask:<br />
Without recourse to introspective judgments, how can ungrammatical<br />
utterances be distinguished from ones that simply haven’t occurred<br />
yet? If our finite corpus does not contain <strong>the</strong> sentence:<br />
*He shines Tony books.<br />
how do we conclude that it is ungrammatical?<br />
And without any discussion of potential alternatives <strong>the</strong>y promptly<br />
give <strong>the</strong> following answer (McEnery <strong>and</strong> Wilson 2001: 12):<br />
It is only by asking a native or expert speaker of a language for <strong>the</strong>ir<br />
opinion of <strong>the</strong> grammaticality of a sentence that we can hope to differentiate<br />
unseen but grammatical constructions from those which are<br />
simply grammatical but unseen.<br />
They conclude <strong>the</strong>ir discussion by stating that “we [corpus linguists]<br />
must not eschew introspection entirely. If we do, detecting ungrammatical<br />
structures <strong>and</strong> ambiguous structures becomes difficult <strong>and</strong>, indeed,<br />
may be impossible.” (McEnery <strong>and</strong> Wilson 2001: 12).<br />
Corpus Linguistics <strong>and</strong> Linguistic Theory 21 (2006), 6177<br />
DOI 10.1515/CLLT.2006.003<br />
1613-7027/06/00020061<br />
Walter de Gruyter
62 A. Stefanowitsch<br />
In this note, I would like to take issue with (a large part of) <strong>the</strong>ir<br />
argument. I will argue that <strong>the</strong> idea that corpora do not contain negative<br />
<strong>evidence</strong> is simply a special case of what I have termed <strong>the</strong> observed<strong>frequency</strong><br />
(or <strong>raw</strong>-<strong>frequency</strong>) fallacy, i. e., <strong>the</strong> belief that “[o]bserved frequencies<br />
of occurrence represent relevant facts for scientific analysis”<br />
(Stefanowitsch 2005: 296). When approached with <strong>the</strong> right methodological<br />
tools, corpora do provide negative <strong>evidence</strong>, i. e., <strong>evidence</strong> that<br />
allows us, in principle, to distinguish between constructions that did not<br />
occur but could have (<strong>the</strong>se could be referred to as ‘accidentally absent’,<br />
<strong>and</strong> constructions that did not occur <strong>and</strong> could not have (<strong>the</strong>se can be<br />
referred to as ‘significantly absent’ structures). Thus, while I do agree<br />
that linguists cannot (<strong>and</strong> should not) ‘eschew introspection entirely’, I<br />
will argue that <strong>the</strong>y can (<strong>and</strong> largely should) eschew introspective judgments<br />
of acceptability.<br />
Collostructional analysis <strong>and</strong> <strong>the</strong> significance of absence<br />
In this section, I will address <strong>the</strong> general issue of how significant absences<br />
of a particular configuration of linguistic elements can be distinguished<br />
from accidental ones, using as an example <strong>the</strong> ‘ability’ or ‘inability’ of<br />
English verbs to occur with ditransitive complementation. The choice of<br />
this example is motivated primarily by practical considerations: as will<br />
presently become clear, <strong>the</strong> method I will use requires <strong>the</strong> researcher to<br />
extract exhaustively from a corpus all occurrences of <strong>the</strong> grammatical<br />
phenomenon in question. Ditransitive complementation happens to be<br />
one of <strong>the</strong> features that is relatively uncontroversially tagged in <strong>the</strong><br />
largest grammatically annotated balanced corpus currently available, <strong>the</strong><br />
British component of <strong>the</strong> International Corpus of English (ICE-GB, cf.<br />
Nelson et al. 2002). However, it is a welcome coincidence that this is<br />
precisely <strong>the</strong> complementation pattern that McEnery <strong>and</strong> Wilson chose<br />
to demonstrate <strong>the</strong> need for grammaticality judgments. 1<br />
The relevant method is one of several that Gries <strong>and</strong> I have developed<br />
in a series of publications specifically for <strong>the</strong> purpose of investigating<br />
<strong>the</strong> relationship between grammatical constructions <strong>and</strong> <strong>the</strong> words occurring<br />
in <strong>the</strong>m, <strong>and</strong> that we refer to collectively as collostructional<br />
analysis (cf. e. g., Stefanowitsch <strong>and</strong> Gries 2003, 2005, to appear a; Gries<br />
<strong>and</strong> Stefanowitsch 2004a, b, to appear). 2 The most basic of <strong>the</strong>se methods,<br />
simple collexeme analysis, allows <strong>the</strong> researcher to identify words<br />
that occur significantly more or less frequently than expected in a given<br />
slot of a construction. This is done on <strong>the</strong> basis of a st<strong>and</strong>ard 2-by-2<br />
contingency table containing four observed frequencies: (a) <strong>the</strong> <strong>frequency</strong><br />
of a given word in a particular slot of a given construction, (b)<br />
<strong>the</strong> <strong>frequency</strong> of <strong>the</strong> same word in <strong>the</strong> corresponding slots of all o<strong>the</strong>r
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 63<br />
Table 1.<br />
Give with ditransitive complementation in <strong>the</strong> ICE-GB<br />
Ditransitive ÿDitransitive Total<br />
give 560 531 1,091<br />
(14.57) (1076.43)<br />
ÿgive 1,264 134,196 135,460<br />
(1,809.43) (133,650.57)<br />
Total 1,824 134,727 136,551<br />
constructions, (c) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r words in <strong>the</strong> relevant slot of<br />
<strong>the</strong> construction under investigation, <strong>and</strong> (d) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r<br />
words in <strong>the</strong> corresponding slot of all o<strong>the</strong>r constructions. From <strong>the</strong>se<br />
frequencies, we can derive <strong>the</strong> expected <strong>frequency</strong> of occurrence of <strong>the</strong><br />
word in <strong>the</strong> construction, which allows us to determine whe<strong>the</strong>r <strong>and</strong> in<br />
what direction <strong>the</strong> observed <strong>frequency</strong> deviates from <strong>the</strong> expected <strong>frequency</strong><br />
<strong>and</strong> whe<strong>the</strong>r this deviation is statistically significant. As an example,<br />
consider Table 1, which shows <strong>the</strong> relevant contingency table for <strong>the</strong><br />
verb give <strong>and</strong> <strong>the</strong> ditransitive complementation pattern in <strong>the</strong> British<br />
Component of <strong>the</strong> International Corpus of English (ICE-GB) (expected<br />
frequencies are shown in paren<strong>the</strong>ses).<br />
As Table 1 shows, give occurs vastly more frequently than expected<br />
with ditransitive complementation; <strong>the</strong> Fisher-Yates exact test shows that<br />
this difference is highly significant (p < 4.94e324, <strong>the</strong> smallest number<br />
a typical current home-issue computer can h<strong>and</strong>le). In collostructional<br />
analysis, we usually take <strong>the</strong> p-value directly as a measure of association<br />
strength (cf. Pedersen 1996 <strong>and</strong> Stefanowitsch <strong>and</strong> Gries 2003: 238 f. for<br />
justification). In o<strong>the</strong>r words, <strong>the</strong> extremely small p-value is taken to be<br />
an indication of an extremely strong association between give <strong>and</strong> <strong>the</strong><br />
ditransitive complementation pattern.<br />
Repeating this procedure for all verbs occurring with ditransitive complementation<br />
in <strong>the</strong> ICE-GB allows us to rank all verbs first, by whe<strong>the</strong>r<br />
<strong>the</strong>y occur more or less frequently than expected, <strong>and</strong> second, by association<br />
strength. Words that occur more frequently than expected are referred<br />
to as attracted collexemes (<strong>the</strong> strength of <strong>the</strong>ir positive association<br />
can be referred to as attraction strength), words that occur less frequently<br />
are referred to as repelled collexemes (with a corresponding repulsion<br />
strength). For example, all verbs occurring significantly more frequently<br />
than expected are shown in Table 2. The significance level of 0.05<br />
was corrected for multiple testing using a simple Bonferroni correction<br />
(Bonferroni 1936) whereby <strong>the</strong> significance level is divided by <strong>the</strong><br />
number of tests. Since <strong>the</strong> ICE-GB contains 4,856 verb types, this gives<br />
us 0.05/4,856 1.03E05. 3
64 A. Stefanowitsch<br />
Table 2.<br />
Significantly attracted collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />
Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />
give 1,091 560 14.57 0.00E000<br />
tell 792 493 10.58 0.00E000<br />
send 295 78 3.94 4.13E076<br />
ask 504 92 6.73 9.65E074<br />
show 628 84 8.39 5.15E056<br />
offer 196 54 2.62 3.73E054<br />
convince 32 23 0.43 1.70E036<br />
cost 65 23 0.87 9.04E027<br />
inform 55 20 0.73 9.57E024<br />
teach 92 23 1.23 7.94E023<br />
assure 19 13 0.25 1.04E020<br />
remind 41 16 0.55 7.25E020<br />
lend 31 12 0.41 3.48E015<br />
promise 43 12 0.57 3.26E013<br />
owe 25 9 0.33 2.24E011<br />
grant 26 9 0.35 3.38E011<br />
warn 38 10 0.51 5.94E011<br />
award 16 7 0.21 7.72E010<br />
persuade 33 8 0.44 1.03E008<br />
allow 326 20 4.35 2.59E008<br />
guarantee 27 7 0.36 5.27E008<br />
deny 51 8 0.68 3.82E007<br />
earn 56 8 0.75 8.03E007<br />
h<strong>and</strong> 16 5 0.21 1.63E006<br />
pay 395 18 5.28 8.66E006<br />
give back 4 3 0.05 9.42E006<br />
The list of verbs in this table could now serve as a basis for a variety<br />
of observations, for example about <strong>the</strong> meaning of <strong>the</strong> ditransitive complementation<br />
pattern. I will not pursue this issue here (but cf. Stefanowitsch<br />
<strong>and</strong> Gries 2003, Section 3.2.2). 4 Instead, let me point out two<br />
facts about <strong>the</strong> way that <strong>the</strong> label ‘ditransitive’ is applied in <strong>the</strong> ICE-GB.<br />
First, structures with nominal <strong>and</strong> with clausal direct objects are included<br />
under this label (i. e., uses like She told me that she wants to be<br />
free of lawyers <strong>and</strong> doctors [ICE GB s2a062 133] or I told him to drive<br />
<strong>the</strong> forklift truck [ICE GB s2a067 050] as well as <strong>the</strong> more obvious I’ve<br />
told you <strong>the</strong> truth [ICE GB w2 f.006 213]). Second, some verbs are<br />
tagged as ditransitive whose second object might be better analyzed as<br />
an oblique argument, e. g., cost, asinIt cost <strong>the</strong>m three quid [ICE-GB<br />
s1a007 054]). 5 In o<strong>the</strong>r words, <strong>the</strong> label is applied ra<strong>the</strong>r generously.<br />
Next, consider Table 3, which shows <strong>the</strong> significantly repelled collexemes,<br />
sorted by repulsion strength (only <strong>the</strong> first two are significant at<br />
corrected levels).
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 65<br />
Table 3.<br />
Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />
Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />
make 1,865 3 24.91 3.39E008<br />
do 2,937 12 39.23 2.56E007<br />
find 854 2 11.41 7.96E004<br />
call 616 1 8.23 2.32E003<br />
keep 374 1 5 3.95E002<br />
One could now ask why verbs that occur in a given construction might<br />
do so less frequently than expected. There are several reasons, some of<br />
<strong>the</strong>m more interesting than o<strong>the</strong>rs. First, verbs may appear on this list<br />
because <strong>the</strong>y are incorrectly tagged (in this case, call, which is tagged as<br />
‘ditransitive’ in <strong>the</strong> utterance And <strong>the</strong> person who’s being called [ICE-<br />
GB s1a030 003]). Obviously, such incorrect tags are hard to eliminate<br />
completely once a corpus reaches a certain size. Second, some verbs<br />
appear on this list because <strong>the</strong>ir ditransitive uses are very restricted, in<br />
some cases to a single fixed expression (in this case, keep s. o. company).<br />
Finally, most verbs appear on this list because <strong>the</strong>y occur very frequently<br />
with o<strong>the</strong>r complementation patterns (this is most obvious for <strong>the</strong> high<strong>frequency</strong><br />
verbs make <strong>and</strong> do, but it is also true of find). What one can<br />
take away from a discussion of such cases is, first, that fixed expressions<br />
must be taken into account in any linguistic analysis, <strong>and</strong> second, that<br />
complementation patterns exhibit a certain amount of productivity, occurring<br />
at least occasionally with verbs whose dominant patterns are<br />
o<strong>the</strong>rs (both facts are unsurprising from <strong>the</strong> perspective of construction<br />
grammar, in which collostructional analysis first developed).<br />
However, <strong>the</strong> data in Table 3 do not speak directly to <strong>the</strong> issue of<br />
negative <strong>evidence</strong> yet: a fur<strong>the</strong>r step is necessary. In our previous work,<br />
we have referred as ‘repelled’ only to those words which do occur in a<br />
given construction but do so less frequently than expected; however, as<br />
we noted in passing in our first paper (cf. Stefanowitsch <strong>and</strong> Gries 2003:<br />
238), it is possible <strong>and</strong> perhaps logical to include in this category<br />
words that would have been expected to occur in <strong>the</strong> construction based<br />
on <strong>the</strong>ir overall <strong>frequency</strong> in <strong>the</strong> corpus, but did not, in fact, occur in<br />
<strong>the</strong> construction at all. This is <strong>the</strong> step that finally takes us to <strong>the</strong> issue<br />
of negative <strong>evidence</strong>: The range of frequencies of occurrence that can be<br />
evaluated for statistical significance include <strong>the</strong> limiting case of zero; in<br />
o<strong>the</strong>r words, <strong>the</strong> non-occurrence of a particular configuration of linguistic<br />
categories (for example, of a particular verb in a particular construction)<br />
can be compared to its expected <strong>frequency</strong> of occurrence. This will
66 A. Stefanowitsch<br />
Table 4.<br />
a. say<br />
Three verbs that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB<br />
Ditransitive ÿDitransitive Total<br />
b. explain<br />
say 0 3,333 3,333<br />
(44.52)<br />
ÿsay 1,824 131,394 133,218<br />
Total 1,824 134,727 136,551<br />
Ditransitive ÿDitransitive Total<br />
explain 0 172 172<br />
(2.30)<br />
ÿexplain 1,824 134,555 136,379<br />
c. whisper<br />
Total 1,824 134,727 136,551<br />
Ditransitive ÿDitransitive Total<br />
whisper 0 5 5<br />
(0.07)<br />
ÿwhisper 1,824 134,722 136,546<br />
Total 1,824 134,727 136,551<br />
allow us, in many cases, to determine whe<strong>the</strong>r an unseen construction is<br />
likely to be a possible construction of a language or not.<br />
Consider Table 4, which shows <strong>the</strong> contingency tables for three verbs<br />
that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB, say,<br />
explain, <strong>and</strong> whisper.<br />
On a priori grounds, we might expect all three verbs to allow ditransitive<br />
complementation, since <strong>the</strong>y are all reasonably close in meaning to<br />
one of <strong>the</strong> most strongly attracted collexemes of this pattern, tell (<strong>and</strong><br />
o<strong>the</strong>r verbs of communication occurring among <strong>the</strong> significantly<br />
attracted collexemes; e. g. ask, inform, teach, assure). On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>,<br />
<strong>the</strong>y are textbook cases in <strong>the</strong> linguistic literature of verbs not allowing<br />
ditransitive complementation (cf. e. g., Pinker 1989).<br />
Table 4a provides conclusive <strong>evidence</strong> that <strong>the</strong> linguistic literature is<br />
right in <strong>the</strong> case of say, whose repulsion strength meets <strong>the</strong> corrected<br />
level of significance (p 1.96E20; < 1.03E05). We can confidently
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 67<br />
claim that <strong>the</strong> combination [say ditransitive] is significantly absent. In<br />
<strong>the</strong> case of explain, <strong>the</strong> repulsion strength does not meet even <strong>the</strong> uncorrected<br />
level (p 0.099; > 0.05), although it is not too far off. It is simply<br />
not frequent enough in <strong>the</strong> ICE-GB to let us determine whe<strong>the</strong>r its nonoccurrence<br />
is accidental or significant, although its marginal statistical<br />
significance may lead us to suspect <strong>the</strong> latter. No such suspicion would<br />
be warranted in <strong>the</strong> case of whisper, whose non-occurrence is well within<br />
<strong>the</strong> range of accidental variation (p 0.935; > 0.05).<br />
Before discussing <strong>the</strong>se issues any fur<strong>the</strong>r, let us take a look at <strong>the</strong><br />
results we get when we apply simple collexeme analysis exploratively to<br />
all verbs that occur in <strong>the</strong> ICE-GB but not in <strong>the</strong> ditransitive. There are<br />
4,856 verb types in <strong>the</strong> ICE-GB (according to my definition, which lists<br />
phrasal verbs as separate types, see Footnote 4). Of <strong>the</strong>se, 4,782 do not<br />
occur in <strong>the</strong> ditransitive. In turn, this non-occurrence is significant only<br />
for 53 verbs (of which only 11 meet <strong>the</strong> corrected level of significance).<br />
Table 5 shows <strong>the</strong> significantly repelled collexemes.<br />
Table 5.<br />
Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />
Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />
be 25,416 0 340.00 4.29E165<br />
be|have 6,261 0 83.63 3.66E038<br />
have 4,303 0 57.48 2.90E026<br />
think 3,335 0 44.55 1.90E020<br />
say 3,333 0 44.52 1.96E020<br />
know 2,120 0 28.32 3.32E013<br />
see 1,971 0 26.33 2.54E012<br />
go 1,900 0 25.38 6.69E012<br />
want 1,256 0 16.78 4.27E008<br />
use 1,222 0 16.32 6.77E008<br />
come 1,140 0 15.23 2.06E007<br />
look 1,099 0 14.68 3.59E007<br />
Significant at uncorrected significance levels:<br />
try 749 0 10.00 4.11E005<br />
mean 669 0 8.94 1.21E004<br />
work 646 0 8.63 1.65E004<br />
like 600 0 8.01 3.08E004<br />
feel 593 0 7.92 3.38E004<br />
become 577 0 7.71 4.20E004<br />
happen 523 0 6.99 8.70E004<br />
put 513 0 6.85 9.96E004<br />
talk 490 0 6.55 1.36E003<br />
hear 483 0 6.45 1.49E003<br />
need 420 0 5.61 3.49E003<br />
believe 397 0 5.30 4.76E003
68 A. Stefanowitsch<br />
Table 5. (continued)<br />
Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />
provide 380 0 5.08 5.99E003<br />
live 378 0 5.05 6.16E003<br />
remember 373 0 4.98 6.59E003<br />
produce 328 0 4.38 1.21E002<br />
speak 323 0 4.31 1.29E002<br />
hope 316 0 4.22 1.42E002<br />
run 309 0 4.13 1.56E002<br />
change 306 0 4.09 1.63E002<br />
meet 303 0 4.05 1.69E002<br />
help 301 0 4.02 1.74E002<br />
start 294 0 3.93 1.91E002<br />
move 291 0 3.89 1.99E002<br />
seem 285 0 3.81 2.16E002<br />
agree 279 0 3.73 2.34E002<br />
lead 271 0 3.62 2.60E002<br />
expect 265 0 3.54 2.82E002<br />
consider 264 0 3.53 2.86E002<br />
suggest 259 0 3.46 3.06E002<br />
describe 259 0 3.46 3.06E002<br />
decide 259 0 3.46 3.06E002<br />
underst<strong>and</strong> 250 0 3.34 3.46E002<br />
hold 249 0 3.33 3.50E002<br />
require 244 0 3.26 3.75E002<br />
involve 242 0 3.23 3.85E002<br />
suppose 241 0 3.22 3.90E002<br />
include 236 0 3.15 4.17E002<br />
occur 233 0 3.11 4.35E002<br />
develop 233 0 3.11 4.35E002<br />
go on 231 0 3.09 4.46E002<br />
follow 227 0 3.03 4.71E002<br />
Two things about this table require discussion. First, it demonstrates<br />
that even a one-million-word corpus is too small to allow us to identify<br />
significant absences for more than a h<strong>and</strong>ful of cases (at least for a<br />
relatively rare pattern such as ditransitive complementation). I will discuss<br />
this problem in <strong>the</strong> remainder of this section <strong>and</strong> in <strong>the</strong> next section.<br />
Second, <strong>the</strong> results only tell us that a particular structure is significantly<br />
absent, <strong>the</strong>y do not, as pointed out in <strong>the</strong> introduction, tell us why it is<br />
significantly absent. I will return to this problem in <strong>the</strong> final section.<br />
The problem of insufficient corpus size can ultimately only be solved<br />
by <strong>the</strong> creation of larger grammatically annotated corpora. However, in<br />
many individual cases it is possible to arrive at a fairly safe conclusion<br />
using currently available non-annotated corpora. Take <strong>the</strong> case of ex-
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 69<br />
plain. In <strong>the</strong> 100-million-word British National Corpus (<strong>the</strong> largest balanced<br />
corpus of British English currently available), <strong>the</strong> verb explain<br />
occurs 18,334 times, but not once with ditransitive complementation. In<br />
Stefanowitsch <strong>and</strong> Gries (2003: 219), we estimated that <strong>the</strong> BNC contains<br />
10,206,300 complementation patterns overall. If we assume that<br />
<strong>the</strong> proportion of ditransitives in <strong>the</strong> BNC is <strong>the</strong> same as in <strong>the</strong> ICE-<br />
GB, <strong>the</strong>n <strong>the</strong> BNC should contain 136,332 ditransitives. Given <strong>the</strong>se<br />
figures, we can now calculate <strong>the</strong> expected <strong>frequency</strong> of occurrence of<br />
explain in <strong>the</strong> ditransitive: 245. The difference between this <strong>and</strong> <strong>the</strong> observed<br />
<strong>frequency</strong> of zero is highly significant (at uncorrected levels of<br />
significance; p 6.73E108; < 0.001). Thus, <strong>the</strong> combination [explain<br />
ditransitive] can be categorized as a significantly absent structure<br />
based on negative corpus <strong>evidence</strong>. This strategy works even with a low<strong>frequency</strong><br />
verb like whisper, which occurs only 2,976 times in <strong>the</strong> BNC,<br />
but, again, does not occur in <strong>the</strong> ditransitive. Under <strong>the</strong> assumptions<br />
just outlined, <strong>the</strong> expected <strong>frequency</strong> of [whisper ditransitive] is 40.<br />
Again, <strong>the</strong> difference is highly significant (at uncorrected levels;<br />
p 4.139019E18; < 0.001). 6 Repeating individual tests on a larger<br />
corpus will, of course, not invariably lead to <strong>the</strong> conclusion that a given<br />
structure is significantly absent. In many cases, <strong>the</strong> <strong>frequency</strong> of a verb<br />
will remain too low to yield significant results. For example, <strong>the</strong> verb<br />
oxidise, like whisper, occurs five times in <strong>the</strong> ICE-GB, never in <strong>the</strong> ditransitive.<br />
In <strong>the</strong> BNC, it occurs 99 times, also never in <strong>the</strong> ditransitive. The<br />
expected <strong>frequency</strong> in this case is 1 (1.32, to be precise), <strong>and</strong> <strong>the</strong> difference<br />
between this <strong>and</strong> <strong>the</strong> observed <strong>frequency</strong> of zero is still far too<br />
small to reach statistical significance (p 0.26; >0.05). In o<strong>the</strong>r cases,<br />
extending a search to a larger corpus will fail to replicate <strong>the</strong> zero occurrence<br />
in <strong>the</strong> smaller corpus. For example, <strong>the</strong> BNC contains one clear<br />
example of donate with ditransitive complementation (1a), <strong>and</strong> a second<br />
potential one (1b):<br />
(1) a. Saudi king donates Laura transplant money. The king of Saudi<br />
Arabia has donated a hundred <strong>and</strong> fifty thous<strong>and</strong> pounds to Laura<br />
Davies ... (K1N)<br />
b. ... if <strong>the</strong> villagers hadn’t so kindly donated her furnishings, she’d<br />
probably still be existing in empty rooms ... (H95). 7<br />
Faced with such examples, <strong>the</strong>re is no longer any reason to believe<br />
that [donate ditransitive] is a significantly absent structure.<br />
Thus, <strong>the</strong> methodological problem of insufficiently large corpora is<br />
not, in principle, an argument against replacing intuitive grammaticality<br />
judgments by negative corpus <strong>evidence</strong> (in practice it may be, a point<br />
which I will return to below). Instead, <strong>the</strong> preceding discussion shows
70 A. Stefanowitsch<br />
that negative corpus <strong>evidence</strong> can be adduced for a pet case of syntactic<br />
<strong>the</strong>orizing (note that among <strong>the</strong> significantly absent collexemes of <strong>the</strong><br />
ditransitive complementation pattern in Table 5, <strong>the</strong>re are a large<br />
number of famously non-ditransitive verbs, e. g., suggest, provide, say,<br />
describe, etc.).<br />
One may now argue that even if such negative corpus <strong>evidence</strong> can be<br />
obtained, it does not add any insights that could not also be arrived at<br />
by introspective acceptability judgments at best (e. g., for say, explain,<br />
whisper), it will confirm what we know from intuition anyway, in <strong>the</strong><br />
worst case it will never yield enough data to decide <strong>the</strong> issue (e. g., oxidise)<br />
or contradict generally agreed-upon acceptability judgments (e. g.,<br />
donate). There are two reasons why this argument is wrong: first, unlike<br />
acceptability judgments, negative corpus <strong>evidence</strong> meets <strong>the</strong> st<strong>and</strong>ards of<br />
scientific research. Second, it is only such corpus <strong>evidence</strong> that will allow<br />
us to make principled statements about what is <strong>and</strong> is not possible: if we<br />
hypo<strong>the</strong>size, based on acceptability judgments, that whisper ditransitive<br />
is impossible, a single counterexample can prove this wrong. While<br />
such counterexamples may not occur even in very large corpora, <strong>the</strong>y<br />
are still easy to come by in <strong>the</strong> age of <strong>the</strong> Internet. A web search quickly<br />
turns up counterexamples produced by native speakers of (British) English,<br />
both with clausal <strong>and</strong>, more crucially, with nominal objects:<br />
(2) a. ... when I first beheld you <strong>the</strong> instinct of Nature whispered me<br />
that we were in some degree related ... (Jane Austen, Love <strong>and</strong><br />
Friendship, Letter 11)<br />
b. She had not been allowed to ... to bury <strong>the</strong> two people she had<br />
loved most in <strong>the</strong> world ... to whisper <strong>the</strong>m a last goodbye. (Meg<br />
Hutchinson, Peppercorn Woman)<br />
Of course, <strong>the</strong>se examples can in turn be questioned; <strong>the</strong>y may reflect<br />
older stages of <strong>the</strong> language (Jane Austen wrote Love <strong>and</strong> Friendship<br />
around 1790), <strong>the</strong>y may reflect regional dialects (Meg Hutchinson lives<br />
in Staffordshire), etc. However, <strong>the</strong> fact remains that we have objective<br />
au<strong>the</strong>ntic examples pitched against subjective intuition. In contrast, <strong>the</strong><br />
negative corpus <strong>evidence</strong> obtained from <strong>the</strong> BNC gives us an objective<br />
basis for arguing that, even though utterances like (2) can occur, <strong>the</strong>y<br />
are very marginal. This allows us to uphold <strong>the</strong> useful generalization<br />
that communication verbs can be used with ditransitive syntax if <strong>the</strong>y<br />
refer to <strong>the</strong> type of message communicated, but not if <strong>the</strong>y refer to <strong>the</strong><br />
manner in which it is communicated (Pinker 1989), even though we have<br />
to reformulate it as a strong statistical tendency instead of a categorical<br />
constraint. The same holds for donate. Even if we generously admit both<br />
(1a <strong>and</strong> 1b) as counterexamples, we can still point out that donate would
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 71<br />
have been expected to occur 14 times (based on <strong>the</strong> assumptions above),<br />
<strong>and</strong> thus still constitutes a strongly repelled collexeme (p 0.0001;<br />
72 A. Stefanowitsch<br />
whe<strong>the</strong>r zero deviates significantly from this expected <strong>frequency</strong> (<strong>the</strong> sufficient<br />
condition for upholding <strong>the</strong> hypo<strong>the</strong>sis). This information will<br />
likely be more difficult to obtain or estimate than information about<br />
complementation patterns, but to do so is by no means impossible.<br />
Any hypo<strong>the</strong>sis about possible <strong>and</strong> impossible structures in language<br />
is ultimately a hypo<strong>the</strong>sis about <strong>the</strong> incompatibility of two (or more)<br />
linguistic categories. As long as <strong>the</strong>se categories can be operationalized<br />
in such a way that <strong>the</strong>y can be exhaustively annotated (or identified<br />
spontaneously) in a corpus of naturally occurring language, <strong>and</strong> as long<br />
as <strong>the</strong> corpus is large enough, this corpus can provide both positive <strong>and</strong><br />
negative <strong>evidence</strong>. The first condition should always be met: if a category<br />
cannot be operationalized for objective identification, it has no place in<br />
a linguistic <strong>the</strong>ory. The second condition is not currently met. There are<br />
several syntactically annotated corpora (for example, <strong>the</strong> Penn Treebank,<br />
Sampson’s Suzanne <strong>and</strong> Christine corpora, <strong>and</strong> <strong>the</strong> ICE-GB used in this<br />
note), but <strong>the</strong>y are ei<strong>the</strong>r too small for many research questions, or <strong>the</strong>ir<br />
annotation scheme is too coarse or too unreliable, or both. However,<br />
this cannot seriously be used as a defense of <strong>the</strong> introspective method.<br />
Instead, it must be used as an argument for <strong>the</strong> funding <strong>and</strong> <strong>the</strong> human<br />
resources necessary for <strong>the</strong> construction of large grammatically annotated<br />
corpora. A discipline can only get so far by thought experiments (if<br />
that is what acceptability judgments are). It begins to make substantial<br />
headway only when it faces up to <strong>the</strong> problem of data scarcity <strong>and</strong> solves<br />
it. Astronomers have built radio telescopes, physicists have built particle<br />
colliders, <strong>and</strong> geneticists have sequenced <strong>the</strong> human genome; linguists<br />
should be able to construct large, balanced, syntactically annotated corpus<br />
of at least <strong>the</strong> world’s major languages. But even until this goal is<br />
reached or, more likely, in case it is never reached corpora can yield<br />
both positive <strong>and</strong> negative <strong>evidence</strong> for <strong>the</strong> construction of linguistic<br />
<strong>the</strong>ories.<br />
Final remarks: <strong>the</strong> occurring <strong>and</strong> <strong>the</strong> non-occurring<br />
The main point of this note was to show that corpora contain negative<br />
<strong>evidence</strong> <strong>and</strong> that this negative corpus <strong>evidence</strong> can, <strong>and</strong> should, replace<br />
introspective acceptability judgments. It seems appropriate, however, to<br />
discuss <strong>the</strong> most important <strong>the</strong>oretical implications of such a step.<br />
First, from <strong>the</strong> perspective advocated here, <strong>the</strong> non-occurrence of a<br />
particular linguistic structure is merely <strong>the</strong> limiting case; it is not qualitatively<br />
different from very rare occurrences. This may seem to be a problem<br />
for an approach that argues for an absolute distinction between<br />
possible <strong>and</strong> impossible configurations of linguistic categories (for example,<br />
between grammatical <strong>and</strong> ungrammatical structures). This problem
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 73<br />
may be more apparent than real, however. The continuum between significantly<br />
rare <strong>and</strong> significantly absent structures is not fundamentally<br />
different from <strong>the</strong> continuum between various degrees of unacceptability<br />
that is regularly found for acceptability ratings. In both cases, <strong>the</strong> data<br />
must be viewed in light of one’s <strong>the</strong>ory of language in order to make<br />
sense of this continuum. Also, it may well be possible to identify a degree<br />
of improbability that is close enough to impossibility to be indistinguishable<br />
from it.<br />
Second, while <strong>the</strong> statistically significant absence (or rareness) of a<br />
particular configuration of grammatical categories can be taken as <strong>evidence</strong><br />
that this configuration is impossible (i. e., very improbable), it does<br />
not, in itself, provide any clues as to why this should be <strong>the</strong> case. Again,<br />
<strong>the</strong> same is true of introspective judgments. Chomsky pointed this out<br />
early on: “The notion ‘acceptable’ is not to be confused with ‘grammatical’.<br />
Acceptability belongs to <strong>the</strong> study of performance whereas grammaticalness<br />
belongs to <strong>the</strong> study of competence” (Chomsky 1965: 11). A<br />
linguistic structure may give rise to introspective judgments of unacceptability<br />
for a number of reasons, of which ungrammaticality (or, more,<br />
generally, failure to conform to general linguistic rules) is just one. What<br />
that reason is must be determined independently of <strong>the</strong> acceptability<br />
judgment. The same is true of significantly absent (or rare) structures:<br />
determining significant absence/rareness is just <strong>the</strong> first step of a linguistic<br />
analysis. The second step is to determine <strong>the</strong> reasons for <strong>the</strong> significant<br />
absence/rareness. This step can be much closer to traditional linguistic<br />
argumentation. First, it may involve <strong>the</strong> search for au<strong>the</strong>ntic counterexamples<br />
(as in <strong>the</strong> case of whisper above) in order to test <strong>the</strong> extent of<br />
this absence. This may uncover variation in <strong>the</strong> data (panchronic, regional,<br />
social, etc.) or particular contexts in which seemingly impossible<br />
structures become possible. Second, it may involve constructing examples<br />
in order to determine whe<strong>the</strong>r <strong>the</strong> significant absence is semantically<br />
determined. If <strong>the</strong> constructed examples are not interpretable, <strong>the</strong> absence<br />
may simply be due to semantic incompatibility. For example, no<br />
interpretation can be assigned to He knew her <strong>the</strong> answer or She saw<br />
him <strong>the</strong> light. If <strong>the</strong> constructed examples are interpretable, <strong>the</strong>ir absence<br />
cannot be due to semantic incompatibility but may instead have purely<br />
formal reasons. For example, He said her <strong>the</strong> answer or She put him <strong>the</strong><br />
book are straightforwardly interpretable (of course, <strong>the</strong>re may be more<br />
fine-grained semantic restrictions as <strong>the</strong> huge literature on ditransitives<br />
shows). In o<strong>the</strong>r words, while I argue against <strong>the</strong> use of acceptability<br />
judgments as a linguistic method, I do not argue against <strong>the</strong> use of interpretation.<br />
There is good reason for this distinction, which I am not <strong>the</strong><br />
first to point out: interpreting utterances is a natural human activity,<br />
judging <strong>the</strong>ir acceptability is not.
74 A. Stefanowitsch<br />
Third, while it is plausible to speak of different degrees of attraction<br />
or repulsion in <strong>the</strong> case of combinations that do occur, it is less clear<br />
whe<strong>the</strong>r it makes sense to speak of different degrees of absence, as <strong>the</strong><br />
ranking of significantly absent collexemes in Table 5 suggests. Methodologically,<br />
this ranking merely reflects <strong>the</strong> certainty with which we can<br />
say that a structure is impossible. One may (but need not) argue, though,<br />
that this certainty reflects <strong>the</strong> certainty of a native speaker, in which case<br />
<strong>the</strong> ‘degrees of absence’ do become relevant to <strong>the</strong>oretical considerations.<br />
Whe<strong>the</strong>r <strong>the</strong> predictions of such a view are borne out by empirical data<br />
remains to be seen.<br />
More generally, it seems to me that accepting <strong>the</strong> methodology I have<br />
argued for here may lead to a slight but pervasive reorientation of linguistic<br />
<strong>the</strong>ory. If we accept significant presence <strong>and</strong> significant absence<br />
(as well as significant <strong>frequency</strong> <strong>and</strong> rareness) as <strong>the</strong> primary facts that<br />
a linguistic <strong>the</strong>ory must explain, <strong>the</strong>n this <strong>the</strong>ory will have to be broader<br />
than most current <strong>the</strong>ories. Ra<strong>the</strong>r than focusing exclusively on grammaticality,<br />
such a <strong>the</strong>ory would have to uncover <strong>the</strong> whole range of<br />
causes for <strong>the</strong> presence <strong>and</strong> absence of linguistic structures <strong>and</strong> investigate<br />
all of <strong>the</strong>m with <strong>the</strong> same degree of rigor <strong>and</strong> explicitness. The aim<br />
of linguistic analysis would no longer be “to separate <strong>the</strong> grammatical<br />
sequences which are <strong>the</strong> sentences of [a language] L from <strong>the</strong> ungrammatical<br />
sequences which are not sentences of L <strong>and</strong> to study <strong>the</strong> structure<br />
of <strong>the</strong> grammatical sentences” (Chomsky 1957: 13). Instead, <strong>the</strong><br />
aim would be to provide for individual languages <strong>and</strong>, ultimately, for<br />
language in general a comprehensive <strong>the</strong>ory of <strong>the</strong> occurring <strong>and</strong> <strong>the</strong><br />
non-occurring. 8<br />
Received January 2006<br />
Revisions received March 2006<br />
Final acceptance March 2006<br />
University of Bremen<br />
Notes<br />
* I would like to thank Stefan Gries, Arne Zeschel <strong>and</strong> <strong>the</strong> participants of <strong>the</strong> 7.<br />
Norddeutsches Linguistisches Kolloquium for <strong>the</strong>ir comments on <strong>the</strong> ideas presented<br />
in this paper. Any conceptual errors are mine alone.<br />
1. Actually, <strong>the</strong>re are several potential reasons for <strong>the</strong> oddness of McEnery <strong>and</strong> Wilson’s<br />
example (for example, <strong>the</strong> use of <strong>the</strong> simple present <strong>and</strong> <strong>the</strong> potential violation<br />
of <strong>the</strong> selection restrictions of <strong>the</strong> verb shine by <strong>the</strong> direct object NP books).<br />
Their discussion suggests, however, that <strong>the</strong>y are concerned with complementation.<br />
2. An overview over this method <strong>and</strong> its place in <strong>the</strong> corpus-based study of grammatical<br />
patterns will be provided in Stefanowitsch <strong>and</strong> Gries (to appear b); meanwhile,<br />
an introduction can be found on my website at . This website also provides a number of Perl scripts for doing col-
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 75<br />
lostructional analysis (PerlClx 1.0); cf. also Gries’ R script CollAnalysis 3, available<br />
from his website at . Incidentally, both scripts can provide <strong>the</strong> corpus <strong>frequency</strong><br />
that a word not occurring in a particular construction must have in order<br />
for its absence to be significant given <strong>the</strong> <strong>frequency</strong> of <strong>the</strong> construction <strong>and</strong> <strong>the</strong><br />
size of <strong>the</strong> corpus. CollAnalysis provides this information as part of a collostructional<br />
analysis, PerlClx contains a script (zclx.pl) exclusively dedicated to this purpose.<br />
3. The Bonferroni correction is meant to place stricter requirements on statistical<br />
significance in situations where multiple tests are performed on <strong>the</strong> same data set:<br />
obviously, <strong>the</strong> more tests you perform, <strong>the</strong> more chances <strong>the</strong>re are for a seemingly<br />
significant result to come about by accident. However, some have argued (for<br />
example, Pernege 1998) that this correction does more harm than good because<br />
it removes many results that are significant. I will not place too much emphasis<br />
here on this correction. In a sense, whe<strong>the</strong>r one has to apply it in <strong>the</strong> context of<br />
collostructional analysis or not depends on one’s view of what one is doing: is<br />
one testing individual word-construction pairs (in which case each test can st<strong>and</strong><br />
on its own <strong>and</strong> one could be less concerned with correcting) or is one testing a<br />
construction <strong>and</strong> all words occurring in it (in which case one could be more concerned<br />
with correcting).<br />
4. These results differ from those that Gries <strong>and</strong> I have presented previously, mainly<br />
because we have focused exclusively on ditransitives with two nominal objects,<br />
whereas I have included here all uses tagged as ‘ditransitive’ in <strong>the</strong> ICE-GB, including<br />
those with a clausal direct object. Fur<strong>the</strong>rmore, here <strong>and</strong> throughout <strong>the</strong><br />
following discussion I have used regular expressions to search <strong>the</strong> corpus files<br />
directly, ra<strong>the</strong>r than using ICECUP, <strong>the</strong> software tool that accompanies <strong>the</strong> ICE-<br />
GB. I have also discarded all verbs marked as ‘ignored’ in <strong>the</strong> corpus annotation<br />
<strong>and</strong> I have discarded all unclear words. I have manually lemmatized <strong>the</strong> verbs<br />
<strong>and</strong> st<strong>and</strong>ardized spelling variants. Finally, I treat phrasal verbs as lemmas in<br />
<strong>the</strong>ir own right (cf. give back in Table 2); <strong>the</strong>y were identified by searching for<br />
verbs that were followed (<strong>and</strong> in some cases preceded) by a particle annotated<br />
as such.<br />
5. Note that cost does not behave like a typical transitive verb (whe<strong>the</strong>r in its monotransitive<br />
or its ditransitive use). For example, it cannot be passivized: *Three quid<br />
were cost (<strong>the</strong>m), *They were cost three quid. Thus, <strong>the</strong> apparent direct object may<br />
be better analyzed as an oblique (for example, a subject complement or an adjunct,<br />
cf. e. g., Quirk et al. 1985, § 16.27).<br />
6. Since we are conducting individual tests here based on hypo<strong>the</strong>ses about specific<br />
verbs, we could argue that <strong>the</strong> levels of significance do not have to be adjusted<br />
for multiple testing. However, since <strong>the</strong>re are thous<strong>and</strong>s of tests of <strong>the</strong> same kind<br />
(verb ditransitive) that we could have performed, it might be a good idea to<br />
correct for multiple testing anyway. According to Leech et al.’s (2001) <strong>frequency</strong><br />
list, <strong>the</strong>re are 38,019 verb types in <strong>the</strong> BNC; if anything, this is an overestimation<br />
(since inaccurately tokenized forms like ["see] [—see] [see?] etc. are all<br />
counted as <strong>the</strong>ir own lemmas), so if we correct on this basis, we are on <strong>the</strong> safe<br />
side. The corrected level of significance is 1.32E06; both explain <strong>and</strong> whisper<br />
clear this level by several orders of magnitude.<br />
7. Due to <strong>the</strong> ambiguity of <strong>the</strong> pronominal form her, it is not clear whe<strong>the</strong>r this<br />
example is monotransitive (They donated [NP her furnishings]) or ditransitive<br />
(They donated [NP her] [NP furnishings]). However, a web search turns up additional<br />
clear (if rare) examples of ditransitive uses of donate, for example, In May<br />
2004, Cycle Heaven, a local retailer, <strong>and</strong> City of York Council threw down <strong>the</strong>
76 A. Stefanowitsch<br />
gauntlet to local schools, saying ‘achieve your target increases in walking <strong>and</strong> cycling<br />
by summer 2005, <strong>and</strong> we will donate you a free, high quality children’s bike! (http://<br />
www.york.gov.uk/cgi-bin/wn_document.pl?type 5927).<br />
8. I do not want to conclude this note without applying <strong>the</strong> by now familiar reasoning<br />
to McEnery <strong>and</strong> Wilson’s question how <strong>the</strong> ungrammaticality of *He shines<br />
Tony books could be determined without intuition judgments. Shine (in all its<br />
senses) occurs 2,258 times in <strong>the</strong> BNC. On <strong>the</strong> assumptions made above, <strong>the</strong> expected<br />
<strong>frequency</strong> of shine with ditransitive complementation would be 30; <strong>the</strong><br />
observed <strong>frequency</strong> is zero. This difference is highly significant (p 6.47E14)<br />
even if we correct for multiple testing. Thus, without resorting to introspection,<br />
we have proved that [shine ditransitive] is significantly absent. Whe<strong>the</strong>r this is<br />
strictly due to ungrammaticality is doubtful: first, McEnery <strong>and</strong> Wilson’s sentence<br />
is interpretable (albeit weird); second, it is possible to find au<strong>the</strong>ntic ditransitive<br />
uses by certified native speakers of English for both senses of shine: (i) Shine me<br />
a light from your eyes dear (Christine McVie, Show me a Smile [performed by<br />
Fleetwood Mac]); (ii) He smiles telling him to shine him a metallic Purple armor<br />
(Jimi Hendrix, Bold as Love). Thus, we could hypothsize that ditransitive uses of<br />
shine are semantically so restricted that <strong>the</strong>y occur only in very specific circumstances<br />
(e. g., <strong>the</strong> ‘light’ reading can only occur ditransitively when <strong>the</strong> direct object<br />
is light) or that <strong>the</strong>y only occur in certain dialects (e. g., Lancashire [McVie]<br />
<strong>and</strong> Seattle [Hendrix]) or registers (e. g., rock lyrics).<br />
References<br />
Bonferroni Carlo E.<br />
1936 Teoria statistica delle classi e calcolo delle probabilità . Pubblicazioni del<br />
R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8,<br />
362.<br />
Chomsky, Noam<br />
1957 Syntactic Structures. The Hague: Mouton.<br />
1965 Aspects of <strong>the</strong> Theory of Syntax. Cambridge, MA: MIT Press.<br />
Gries, Stefan Th. <strong>and</strong> Anatol Stefanowitsch<br />
2004a Extending collostructional analysis: a corpus-based perspective on ‘alternations’.<br />
International Journal of Corpus Linguistics 9(1), 97129.<br />
2004b Co-varying collexemes in <strong>the</strong> into-causative. In: Achard, Michel <strong>and</strong> Suzanne<br />
Kemmer (eds.), Language, Culture, <strong>and</strong> Mind. Stanford: CSLI,<br />
225236.<br />
To appear Cluster analysis <strong>and</strong> <strong>the</strong> identification of collexeme classes. In John Newman<br />
<strong>and</strong> Sally Rice (eds.), Empirical <strong>and</strong> Experimental Methods in Cognitive/Functional<br />
Research. Stanford: CSLI.<br />
Leech, Geoffrey, Paul Rayson, <strong>and</strong> Andrew Wilson<br />
2001 Word Frequencies in Written <strong>and</strong> Spoken English: Based on <strong>the</strong> British<br />
National Corpus. London: Longman.<br />
McEnery, Tony, <strong>and</strong> Andrew Wilson<br />
2001 Corpus Linguistics. An Introduction. Second edition. Edinburgh: Edinburgh<br />
University Press.<br />
Nelson, Gerald, Sean Wallis <strong>and</strong> Bas Aarts (eds.)<br />
2002 Exploring Natural Language: Working with <strong>the</strong> British Component of <strong>the</strong><br />
International Corpus of English. Amsterdam <strong>and</strong> Philadelphia: John Benjamins.
<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 77<br />
Pedersen, Ted<br />
1996 Fishing for exactness. Proceedings of <strong>the</strong> South Central SAS User’s Group<br />
Conference, Austin, TX, 188200.<br />
Pernege, Thomas V<br />
1998 What’s wrong with Bonferroni adjustments. British Medical Journal 316,<br />
12361238.<br />
Pinker, Steven<br />
1989 Learnability <strong>and</strong> Cognition. The Acquisition of Argument Structure. Cambridge,<br />
MA: MIT Press.<br />
Quirk, R<strong>and</strong>olph, Sidney Greenbaum, Geoffrey Leech, <strong>and</strong> Jan Svartvik<br />
1985 A Comprehensive Grammar of <strong>the</strong> English Language. London: Longman.<br />
Stefanowitsch, Anatol<br />
2005 New York, Dayton (Ohio), <strong>and</strong> <strong>the</strong> Raw Frequency Fallacy. Corpus Linguistics<br />
<strong>and</strong> Linguistic Theory 1(2), 295301.<br />
Stefanowitsch, Anatol <strong>and</strong> Stefan Th. Gries<br />
2003 Collostructions: investigating <strong>the</strong> interaction of words <strong>and</strong> constructions.<br />
International Journal of Corpus Linguistics 8(2), 209243.<br />
2005 Covarying Collexemes. Corpus Linguistics <strong>and</strong> Linguistic Theory 1(1),<br />
143.<br />
To appear a Channel <strong>and</strong> constructional meaning: A collostructional case study. In:<br />
Kristiansen, Gitte <strong>and</strong> René Dirven (eds.), Cognitive Sociolinguistics:<br />
Language Variation, Cultural Models, Social Systems. Berlin <strong>and</strong> New<br />
York: Mouton de Gruyter.<br />
To appear b Corpora <strong>and</strong> Grammar. In: Anke Lüdeling, Meria Kytö, <strong>and</strong> Tony<br />
McEnery. Corpus Linguistics (H<strong>and</strong>books of Linguistics <strong>and</strong> Communication<br />
Science/HSK). Berlin: Mouton de Gruyter.