10.01.2014 Views

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

62 A. Stefanowitsch<br />

In this note, I would like to take issue with (a large part of) <strong>the</strong>ir<br />

argument. I will argue that <strong>the</strong> idea that corpora do not contain negative<br />

<strong>evidence</strong> is simply a special case of what I have termed <strong>the</strong> observed<strong>frequency</strong><br />

(or <strong>raw</strong>-<strong>frequency</strong>) fallacy, i. e., <strong>the</strong> belief that “[o]bserved frequencies<br />

of occurrence represent relevant facts for scientific analysis”<br />

(Stefanowitsch 2005: 296). When approached with <strong>the</strong> right methodological<br />

tools, corpora do provide negative <strong>evidence</strong>, i. e., <strong>evidence</strong> that<br />

allows us, in principle, to distinguish between constructions that did not<br />

occur but could have (<strong>the</strong>se could be referred to as ‘accidentally absent’,<br />

<strong>and</strong> constructions that did not occur <strong>and</strong> could not have (<strong>the</strong>se can be<br />

referred to as ‘significantly absent’ structures). Thus, while I do agree<br />

that linguists cannot (<strong>and</strong> should not) ‘eschew introspection entirely’, I<br />

will argue that <strong>the</strong>y can (<strong>and</strong> largely should) eschew introspective judgments<br />

of acceptability.<br />

Collostructional analysis <strong>and</strong> <strong>the</strong> significance of absence<br />

In this section, I will address <strong>the</strong> general issue of how significant absences<br />

of a particular configuration of linguistic elements can be distinguished<br />

from accidental ones, using as an example <strong>the</strong> ‘ability’ or ‘inability’ of<br />

English verbs to occur with ditransitive complementation. The choice of<br />

this example is motivated primarily by practical considerations: as will<br />

presently become clear, <strong>the</strong> method I will use requires <strong>the</strong> researcher to<br />

extract exhaustively from a corpus all occurrences of <strong>the</strong> grammatical<br />

phenomenon in question. Ditransitive complementation happens to be<br />

one of <strong>the</strong> features that is relatively uncontroversially tagged in <strong>the</strong><br />

largest grammatically annotated balanced corpus currently available, <strong>the</strong><br />

British component of <strong>the</strong> International Corpus of English (ICE-GB, cf.<br />

Nelson et al. 2002). However, it is a welcome coincidence that this is<br />

precisely <strong>the</strong> complementation pattern that McEnery <strong>and</strong> Wilson chose<br />

to demonstrate <strong>the</strong> need for grammaticality judgments. 1<br />

The relevant method is one of several that Gries <strong>and</strong> I have developed<br />

in a series of publications specifically for <strong>the</strong> purpose of investigating<br />

<strong>the</strong> relationship between grammatical constructions <strong>and</strong> <strong>the</strong> words occurring<br />

in <strong>the</strong>m, <strong>and</strong> that we refer to collectively as collostructional<br />

analysis (cf. e. g., Stefanowitsch <strong>and</strong> Gries 2003, 2005, to appear a; Gries<br />

<strong>and</strong> Stefanowitsch 2004a, b, to appear). 2 The most basic of <strong>the</strong>se methods,<br />

simple collexeme analysis, allows <strong>the</strong> researcher to identify words<br />

that occur significantly more or less frequently than expected in a given<br />

slot of a construction. This is done on <strong>the</strong> basis of a st<strong>and</strong>ard 2-by-2<br />

contingency table containing four observed frequencies: (a) <strong>the</strong> <strong>frequency</strong><br />

of a given word in a particular slot of a given construction, (b)<br />

<strong>the</strong> <strong>frequency</strong> of <strong>the</strong> same word in <strong>the</strong> corresponding slots of all o<strong>the</strong>r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!