Negative evidence and the raw frequency fallacy* - CiteSeerX
Negative evidence and the raw frequency fallacy* - CiteSeerX
Negative evidence and the raw frequency fallacy* - CiteSeerX
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
72 A. Stefanowitsch<br />
whe<strong>the</strong>r zero deviates significantly from this expected <strong>frequency</strong> (<strong>the</strong> sufficient<br />
condition for upholding <strong>the</strong> hypo<strong>the</strong>sis). This information will<br />
likely be more difficult to obtain or estimate than information about<br />
complementation patterns, but to do so is by no means impossible.<br />
Any hypo<strong>the</strong>sis about possible <strong>and</strong> impossible structures in language<br />
is ultimately a hypo<strong>the</strong>sis about <strong>the</strong> incompatibility of two (or more)<br />
linguistic categories. As long as <strong>the</strong>se categories can be operationalized<br />
in such a way that <strong>the</strong>y can be exhaustively annotated (or identified<br />
spontaneously) in a corpus of naturally occurring language, <strong>and</strong> as long<br />
as <strong>the</strong> corpus is large enough, this corpus can provide both positive <strong>and</strong><br />
negative <strong>evidence</strong>. The first condition should always be met: if a category<br />
cannot be operationalized for objective identification, it has no place in<br />
a linguistic <strong>the</strong>ory. The second condition is not currently met. There are<br />
several syntactically annotated corpora (for example, <strong>the</strong> Penn Treebank,<br />
Sampson’s Suzanne <strong>and</strong> Christine corpora, <strong>and</strong> <strong>the</strong> ICE-GB used in this<br />
note), but <strong>the</strong>y are ei<strong>the</strong>r too small for many research questions, or <strong>the</strong>ir<br />
annotation scheme is too coarse or too unreliable, or both. However,<br />
this cannot seriously be used as a defense of <strong>the</strong> introspective method.<br />
Instead, it must be used as an argument for <strong>the</strong> funding <strong>and</strong> <strong>the</strong> human<br />
resources necessary for <strong>the</strong> construction of large grammatically annotated<br />
corpora. A discipline can only get so far by thought experiments (if<br />
that is what acceptability judgments are). It begins to make substantial<br />
headway only when it faces up to <strong>the</strong> problem of data scarcity <strong>and</strong> solves<br />
it. Astronomers have built radio telescopes, physicists have built particle<br />
colliders, <strong>and</strong> geneticists have sequenced <strong>the</strong> human genome; linguists<br />
should be able to construct large, balanced, syntactically annotated corpus<br />
of at least <strong>the</strong> world’s major languages. But even until this goal is<br />
reached or, more likely, in case it is never reached corpora can yield<br />
both positive <strong>and</strong> negative <strong>evidence</strong> for <strong>the</strong> construction of linguistic<br />
<strong>the</strong>ories.<br />
Final remarks: <strong>the</strong> occurring <strong>and</strong> <strong>the</strong> non-occurring<br />
The main point of this note was to show that corpora contain negative<br />
<strong>evidence</strong> <strong>and</strong> that this negative corpus <strong>evidence</strong> can, <strong>and</strong> should, replace<br />
introspective acceptability judgments. It seems appropriate, however, to<br />
discuss <strong>the</strong> most important <strong>the</strong>oretical implications of such a step.<br />
First, from <strong>the</strong> perspective advocated here, <strong>the</strong> non-occurrence of a<br />
particular linguistic structure is merely <strong>the</strong> limiting case; it is not qualitatively<br />
different from very rare occurrences. This may seem to be a problem<br />
for an approach that argues for an absolute distinction between<br />
possible <strong>and</strong> impossible configurations of linguistic categories (for example,<br />
between grammatical <strong>and</strong> ungrammatical structures). This problem