10.01.2014 Views

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

72 A. Stefanowitsch<br />

whe<strong>the</strong>r zero deviates significantly from this expected <strong>frequency</strong> (<strong>the</strong> sufficient<br />

condition for upholding <strong>the</strong> hypo<strong>the</strong>sis). This information will<br />

likely be more difficult to obtain or estimate than information about<br />

complementation patterns, but to do so is by no means impossible.<br />

Any hypo<strong>the</strong>sis about possible <strong>and</strong> impossible structures in language<br />

is ultimately a hypo<strong>the</strong>sis about <strong>the</strong> incompatibility of two (or more)<br />

linguistic categories. As long as <strong>the</strong>se categories can be operationalized<br />

in such a way that <strong>the</strong>y can be exhaustively annotated (or identified<br />

spontaneously) in a corpus of naturally occurring language, <strong>and</strong> as long<br />

as <strong>the</strong> corpus is large enough, this corpus can provide both positive <strong>and</strong><br />

negative <strong>evidence</strong>. The first condition should always be met: if a category<br />

cannot be operationalized for objective identification, it has no place in<br />

a linguistic <strong>the</strong>ory. The second condition is not currently met. There are<br />

several syntactically annotated corpora (for example, <strong>the</strong> Penn Treebank,<br />

Sampson’s Suzanne <strong>and</strong> Christine corpora, <strong>and</strong> <strong>the</strong> ICE-GB used in this<br />

note), but <strong>the</strong>y are ei<strong>the</strong>r too small for many research questions, or <strong>the</strong>ir<br />

annotation scheme is too coarse or too unreliable, or both. However,<br />

this cannot seriously be used as a defense of <strong>the</strong> introspective method.<br />

Instead, it must be used as an argument for <strong>the</strong> funding <strong>and</strong> <strong>the</strong> human<br />

resources necessary for <strong>the</strong> construction of large grammatically annotated<br />

corpora. A discipline can only get so far by thought experiments (if<br />

that is what acceptability judgments are). It begins to make substantial<br />

headway only when it faces up to <strong>the</strong> problem of data scarcity <strong>and</strong> solves<br />

it. Astronomers have built radio telescopes, physicists have built particle<br />

colliders, <strong>and</strong> geneticists have sequenced <strong>the</strong> human genome; linguists<br />

should be able to construct large, balanced, syntactically annotated corpus<br />

of at least <strong>the</strong> world’s major languages. But even until this goal is<br />

reached or, more likely, in case it is never reached corpora can yield<br />

both positive <strong>and</strong> negative <strong>evidence</strong> for <strong>the</strong> construction of linguistic<br />

<strong>the</strong>ories.<br />

Final remarks: <strong>the</strong> occurring <strong>and</strong> <strong>the</strong> non-occurring<br />

The main point of this note was to show that corpora contain negative<br />

<strong>evidence</strong> <strong>and</strong> that this negative corpus <strong>evidence</strong> can, <strong>and</strong> should, replace<br />

introspective acceptability judgments. It seems appropriate, however, to<br />

discuss <strong>the</strong> most important <strong>the</strong>oretical implications of such a step.<br />

First, from <strong>the</strong> perspective advocated here, <strong>the</strong> non-occurrence of a<br />

particular linguistic structure is merely <strong>the</strong> limiting case; it is not qualitatively<br />

different from very rare occurrences. This may seem to be a problem<br />

for an approach that argues for an absolute distinction between<br />

possible <strong>and</strong> impossible configurations of linguistic categories (for example,<br />

between grammatical <strong>and</strong> ungrammatical structures). This problem

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!