29.08.2013 Views

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Connectionist</strong> <strong>Model<strong>in</strong>g</strong> <strong>of</strong> <strong>Experience</strong>-<strong>based</strong><br />

<strong>Effects</strong> <strong>in</strong> <strong>Sentence</strong> Comprehension<br />

Magisterarbeit im Fach Computerl<strong>in</strong>guistik<br />

am Institut für L<strong>in</strong>guistik der Universität Potsdam<br />

vorgelegt von<br />

Felix Engelmann<br />

Matrikelnummer: 716604<br />

Potsdam, Februar 2009<br />

1. Gutachter: Pr<strong>of</strong>. Dr. Shravan Vasishth<br />

2. Gutachter: Dr. He<strong>in</strong>er Drenhaus


Zusammenfassung<br />

Die vorliegende Arbeit widmet sich der Rolle <strong>in</strong>dividueller sprachlicher Erfahrung im<br />

Rahmen computationaler Modelle <strong>in</strong> der Psychol<strong>in</strong>guistik. Neuere Forschung deckt<br />

zunehmend sprachgebundene und sprechergebundene Verarbeitungsunterschiede auf,<br />

welche e<strong>in</strong>e Herausforderung für sprachunabhängige Universalmodelle darstellen. Davon<br />

ausgehend, dass <strong>in</strong>dividuelle und sprachspezifische Fähigkeiten sich aus der sprachlichen<br />

Umgebung des Sprechers herleiten, versuchen erfahrungsbasierte Theorien die<br />

hochkomplexen Zusammenhänge zwischen korpusbasierten Regularitäten und Sprachfähigkeit<br />

zu erfassen. Während explizit konzeptionierte symbolbasierte Modelle nur stark<br />

vere<strong>in</strong>fachende Darstellungen liefern können, ermöglicht die Anwendung konnektionistischer<br />

Verfahren lernfähige Modelle, mit denen sich funktionale Beziehungen zwischen<br />

Erfahrung und sprachlichen Fähigkeiten herstellen lassen.<br />

In der vorliegenden Arbeit wird die Aussagekraft solcher konnektionistischer Modelle<br />

im Vergleich zu traditionellen Ansätzen anhand von zwei Beispielphänomenen untersucht.<br />

Zum e<strong>in</strong>en wird die neuere Literatur zu Subjekt- und Objektrelativsätzen im<br />

ch<strong>in</strong>esischen Mandar<strong>in</strong> erörtert. Diesbezügliche Lesestudien zeigen zwar sehr gemischte<br />

Resultate, legen aber die Annahme nahe, dass Objektrelativsätze im Mandar<strong>in</strong> e<strong>in</strong>facher<br />

verarbeitet werden als Subjektrelativsätze, was e<strong>in</strong>en Gegensatz zur ansonsten<br />

sprachübergreifenden Subjektpräferenz darstellt.<br />

Als weiterer Untersuchungsgegenstand wird die Grammatikalitäts-Illusion <strong>in</strong> ungrammatischen<br />

Zentrale<strong>in</strong>bettungen herangezogen. Während im Englischen das Fehlen e<strong>in</strong>es<br />

e<strong>in</strong>gebetteten Verbes höhere Akzeptanz erzeugt als die grammatikalisch korrekte Version,<br />

entsteht diese Illusion bei Deutschen Lesern nicht.<br />

Aufbauend auf e<strong>in</strong>em konnektionistischen Modell von MacDonald und Christiansen<br />

(2002), welches konsistente Vorhersagen für <strong>in</strong>dividuelle Unterschiede <strong>in</strong> der Verarbeitung<br />

englischer Relativsätze ermöglicht, werden neue Simulationen durchgeführt um<br />

beide Phänomene mit Modell-Vorhersagen zu vergleichen. Die Simulationsergebnisse<br />

sagen zum e<strong>in</strong>en e<strong>in</strong>e Objektpräferenz für Mandar<strong>in</strong> und zum anderen das Fehlen der<br />

Grammatikalitäts-Illusion im Deutschen vorher. Die empirische Konsistenz der Ergebnisse<br />

ist jedoch nur von oberflächlicher Natur und hält e<strong>in</strong>er genaueren Analyse nicht<br />

stand.<br />

II


Acknowledgments<br />

I am grateful to Bei Wang, who was so k<strong>in</strong>d to teach me on Ch<strong>in</strong>ese relative clause<br />

grammar for this work. For revision and helpful comments I owe special thanks to Pavel<br />

Logaçev and Titus von der Malsburg. And most <strong>of</strong> all, I want to thank my supervisor<br />

Shravan Vasishth for all his support and patience.<br />

III


Contents<br />

List <strong>of</strong> Figures VI<br />

List <strong>of</strong> Tables VII<br />

1 Prelim<strong>in</strong>aries 1<br />

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />

1.2 Relative Clauses and Complexity . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.3 Psychol<strong>in</strong>guistic Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.3.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.3.2 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.3.3 Canonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.3.4 <strong>Experience</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g 18<br />

2.1 The Subject/Object Difference . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.2 Ch<strong>in</strong>ese Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically . . . . . . . . . 25<br />

2.3.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.3.2 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.3.3 Canonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.3.4 <strong>Experience</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.3.5 Other Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

2.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . . . . . . . 32<br />

2.5 Forgett<strong>in</strong>g <strong>Effects</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

2.5.1 The Grammaticality Illusion . . . . . . . . . . . . . . . . . . . . . 38<br />

2.5.2 Expla<strong>in</strong><strong>in</strong>g the Forgett<strong>in</strong>g Effect . . . . . . . . . . . . . . . . . . . 44<br />

3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension 47<br />

3.1 Structure and Learn<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.2 Recursion and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.3.1 MacDonald and Christiansen (2002) . . . . . . . . . . . . . . . . . 51<br />

3.3.2 Critique and Relation to other Approaches . . . . . . . . . . . . . 54<br />

3.3.3 What is learned? . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

IV


Contents<br />

3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4 Two SRN Prediction Studies 62<br />

4.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

4.1.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

4.1.2 Grammar and Corpora . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

4.1.3 Tra<strong>in</strong><strong>in</strong>g and Test<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

4.2 Replication <strong>of</strong> Previous Simulations . . . . . . . . . . . . . . . . . . . . . 65<br />

4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

4.3.1 Simulation 1: Regularity . . . . . . . . . . . . . . . . . . . . . . . 66<br />

4.3.2 Simulation 2: Frequency . . . . . . . . . . . . . . . . . . . . . . . 70<br />

4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

4.4 Forgett<strong>in</strong>g <strong>Effects</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

4.4.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

4.4.2 Simulation 3: English . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.4.3 Simulation 4: German . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

Bibliography VIII<br />

A Statistics XVII<br />

B Grammars XIX<br />

B.1 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIX<br />

B.2 German . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XX<br />

B.3 Mandar<strong>in</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXII<br />

V


List <strong>of</strong> Figures<br />

1.1 DLT memory cost for English RC. . . . . . . . . . . . . . . . . . . . . . . 8<br />

2.1 English RC read<strong>in</strong>g times (K<strong>in</strong>g and Just, 1991) . . . . . . . . . . . . . . 19<br />

2.2 DLT memory cost for English RCs . . . . . . . . . . . . . . . . . . . . . 25<br />

2.3 DLT memory cost for Mandar<strong>in</strong> RCs . . . . . . . . . . . . . . . . . . . . 26<br />

2.4 CC-READER simulation on English RCs (Just and Carpenter, 1992) . . 28<br />

2.5 DLT memory cost for the three VPs <strong>in</strong> a doubly embedded ORC. . . . . 41<br />

2.6 Ungrammaticality <strong>in</strong> English (Vasishth et al., 2008) . . . . . . . . . . . . 43<br />

2.7 Ungrammaticality <strong>in</strong> German (Vasishth et al., 2008) . . . . . . . . . . . . 44<br />

3.1 Simple recurrent network (Elman, 1990) . . . . . . . . . . . . . . . . . . 48<br />

3.2 Ungrammaticality simulation (Christiansen and Chater, 1999) . . . . . . 51<br />

3.3 F requency × Regularity simulation (MacDonald and Christiansen, 2002) 53<br />

3.4 Wells et al. (2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

3.5 Wells et al. (2009) compared MacDonald and Christiansen (2002) . . . . 59<br />

4.1 Replication <strong>of</strong> MacDonald and Christiansen (2002) . . . . . . . . . . . . 66<br />

4.2 Replication <strong>of</strong> Konieczny and Ruh (2003) . . . . . . . . . . . . . . . . . . 67<br />

4.3 Output node activations on the relativizer <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . 68<br />

4.4 Simulation 1: Mandar<strong>in</strong> ORC regularity. . . . . . . . . . . . . . . . . . . 69<br />

4.5 Simulation 2: Mandar<strong>in</strong> SRC Frequency . . . . . . . . . . . . . . . . . . 71<br />

4.6 Simulation 3a: Forgett<strong>in</strong>g effect <strong>in</strong> English without commas . . . . . . . 75<br />

4.7 Simulation 3b: Forgett<strong>in</strong>g effect <strong>in</strong> English with commas . . . . . . . . . 77<br />

4.8 Simulation 4a: Forgett<strong>in</strong>g effect <strong>in</strong> German . . . . . . . . . . . . . . . . . 78<br />

4.9 Simulation 4b: Forgett<strong>in</strong>g effect <strong>in</strong> German without commas . . . . . . . 79<br />

VI


List <strong>of</strong> Tables<br />

2.1 Languages with subject preference (L<strong>in</strong> and Bever, 2006b) . . . . . . . . 20<br />

2.2 Mandar<strong>in</strong> corpus study (Kuo and Vasishth, 2007) . . . . . . . . . . . . . 24<br />

2.3 RC Extraction preference predictions . . . . . . . . . . . . . . . . . . . . 33<br />

2.4 Studies <strong>of</strong> Mandar<strong>in</strong> RC extraction . . . . . . . . . . . . . . . . . . . . . 38<br />

A.1 Statistics for simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />

A.2 Statistics for simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />

A.3 Statistics for simulation 3a . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />

A.4 Statistics for simulation 3b . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />

A.5 Statistics for simulation 4a . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />

A.6 Statistics for simulation 4b . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />

B.1 Mandar<strong>in</strong> lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXIII<br />

VII


Chapter 1<br />

Prelim<strong>in</strong>aries<br />

1.1 Introduction<br />

Psychol<strong>in</strong>guistic models <strong>of</strong> human language process<strong>in</strong>g <strong>in</strong> the tradition <strong>of</strong> competence<br />

theory (Chomsky, 1965) are anchored <strong>in</strong> generative grammar theories. So-called strong<br />

type-transparency approaches (Berwick and We<strong>in</strong>berg, 1984) assume that pars<strong>in</strong>g processes<br />

are directly driven by the underly<strong>in</strong>g grammatical structure. There are currently<br />

a number <strong>of</strong> compet<strong>in</strong>g grammar-<strong>based</strong> approaches that base on different grammatical<br />

pr<strong>in</strong>ciples. Examples are categorial grammars, head-driven phrase structure grammar,<br />

m<strong>in</strong>imalism, and optimality theory (for an overview <strong>of</strong> current symbolic process<strong>in</strong>g<br />

models see Vasishth and Lewis, 2006a). However, empirical data from self-paced read<strong>in</strong>g,<br />

eyetrack<strong>in</strong>g, or bra<strong>in</strong> imag<strong>in</strong>g studies, exhibit difficulty patterns that cannot be<br />

expla<strong>in</strong>ed by a strong l<strong>in</strong>k<strong>in</strong>g to competence. A precise model<strong>in</strong>g <strong>of</strong> human performance<br />

requires three aspects <strong>of</strong> cognition to be accounted for: a) biological constra<strong>in</strong>ts, b)<br />

gradedness, and c) experience.<br />

a) In contrast to the abstract logical nature <strong>of</strong> competence theories, human process<strong>in</strong>g<br />

performance results from an <strong>in</strong>teraction <strong>of</strong> l<strong>in</strong>guistic and biological factors. Cognitive<br />

psychology is centered around the resource-bounded nature <strong>of</strong> human cognition. Specifically<br />

important for real-time cognitive tasks is the limitedness <strong>of</strong> short-term memory.<br />

Common properties <strong>of</strong> short-term or work<strong>in</strong>g memory are a limited capacity, decay over<br />

time, and memory <strong>in</strong>terference. These <strong>in</strong>sights from general cognitive psychology suggest<br />

that language process<strong>in</strong>g performance is not only constra<strong>in</strong>ed by the pr<strong>in</strong>ciples <strong>of</strong><br />

work<strong>in</strong>g memory, but also relies on process<strong>in</strong>g strategies adapted to these constra<strong>in</strong>ts.<br />

The latter conclusion is addressed <strong>in</strong> a symbolically abstracted fashion by ambiguity<br />

resolution pr<strong>in</strong>ciples like m<strong>in</strong>imal attachment and late closure (Frazier, 1979) or<br />

special low versus high preferences <strong>in</strong> NP attachment ambiguities (Frazier and Clifton,<br />

1996). A theory address<strong>in</strong>g process<strong>in</strong>g difficulties caused by capacity and decay is the<br />

dependency locality theory (DLT, Gibson, 1998; 2000).<br />

b) Especially important for psychol<strong>in</strong>guistic models <strong>of</strong> work<strong>in</strong>g memory processes is to<br />

account for the “cont<strong>in</strong>uous, graded nature <strong>of</strong> human performance” (Vasishth and Lewis,<br />

2006a). Current comprehension models account<strong>in</strong>g for that aspect are implemented <strong>in</strong><br />

activation-<strong>based</strong> architectures, for example CC-READER (Just and Carpenter, 1992)<br />

1


Chapter 1 Prelim<strong>in</strong>aries<br />

and the ACT-R-<strong>based</strong> sentence process<strong>in</strong>g model (Lewis and Vasishth, 2005).<br />

c) The third aspect characteriz<strong>in</strong>g human cognition, which is more and more acknowledged<br />

<strong>in</strong> psychol<strong>in</strong>guistics, is the <strong>in</strong>fluence <strong>of</strong> experience. Probabilistic models like<br />

Jurafsky (1996) use corpus-extracted likelihoods to construct probabilistic grammars,<br />

therewith captur<strong>in</strong>g aspects <strong>of</strong> frequency and plausibility. The scope <strong>of</strong> these models,<br />

however, is mostly conf<strong>in</strong>ed to the prediction <strong>of</strong> ambiguity resolution and acceptability<br />

rat<strong>in</strong>gs. A related theory predict<strong>in</strong>g word-by-word difficulties is the expectation-<strong>based</strong><br />

approach by Levy (2008). The tun<strong>in</strong>g hypothesis by Mitchell et al. (1995) rather<br />

considers higher-level structural regularities as the ma<strong>in</strong> aspect <strong>of</strong> experience. <strong>Experience</strong>,<br />

however, is a vague term that relates to a complex <strong>in</strong>teraction <strong>of</strong> frequencies <strong>of</strong><br />

words and structures, plausibility, semantic context and structural regularities. Moreover,<br />

the connection between exposure and process<strong>in</strong>g skill is mediated by a learn<strong>in</strong>g<br />

process, which is aga<strong>in</strong> constra<strong>in</strong>ed by biological factors. Thus the l<strong>in</strong>k<strong>in</strong>g from corpus<br />

regularities and the like to observable effects is not trivial.<br />

A type <strong>of</strong> models that provide a promis<strong>in</strong>g approach to experience are connectionist<br />

network models (also called models <strong>of</strong> parallel distributed process<strong>in</strong>g, PDP). In the<br />

past connectionist models have predom<strong>in</strong>antly been used for word-level tasks like spoken<br />

and visual word recognition (McClelland and Elman, 1984; Seidenberg and McClelland,<br />

1989), lexical ambiguity resolution (Kawamoto, 1993), phoneme production (Dell et al.,<br />

2002), and past tense acquisition (Rumelhart and McClelland, 1985). However, more and<br />

more connectionist approaches start to tackle the doma<strong>in</strong> <strong>of</strong> comprehension (Christiansen<br />

and Chater, 1999; Rohde, 2002; Tabor et al., 1997). MacDonald and Christiansen (2002)<br />

(henceforth MC02) successfully used a simple recurrent network (SRN, Elman,<br />

1990) to implement their account <strong>of</strong> skill-through-experience, propos<strong>in</strong>g an <strong>in</strong>teraction<br />

<strong>of</strong> structural regularities with biological factors. Us<strong>in</strong>g an SRN they took advantage <strong>of</strong><br />

special properties evolv<strong>in</strong>g from <strong>in</strong>teractive activation-<strong>based</strong> parallel distributed connectionism.<br />

The <strong>in</strong>herent architectural properties <strong>of</strong> such models cause emergent behavior<br />

that can be described <strong>in</strong> terms <strong>of</strong> limited memory capacity, decay, <strong>in</strong>terference, and cont<strong>in</strong>uously<br />

graded performance. In addition, their behavior and <strong>in</strong>ternal representations<br />

are entirely determ<strong>in</strong>ed by learn<strong>in</strong>g. That means that connectionist networks are basically<br />

pure grammar-<strong>in</strong>dependent performance models (Christiansen and Chater, 1999;<br />

p. 3), that <strong>in</strong>corporate all three aspects <strong>of</strong> human language process<strong>in</strong>g mentioned above.<br />

A special advantage <strong>of</strong> experience-<strong>based</strong> models is that they can account for <strong>in</strong>dividual<br />

performance differences and language-specific effects <strong>in</strong> a similar way. Individual differences<br />

are addressed <strong>in</strong> symbolic models like Just and Carpenter (1992) by differences <strong>in</strong><br />

the capacity limit, but they do not provide a comprehensive explanation for the orig<strong>in</strong><br />

<strong>of</strong> these limits. Language-specific effects means different performance patterns on comparable<br />

structures <strong>in</strong> different languages. Evidence for that phenomenon are effects <strong>of</strong><br />

antilocality <strong>in</strong> head-f<strong>in</strong>al languages (Konieczny, 2000) and the forgett<strong>in</strong>g effect <strong>in</strong><br />

complex center-embedd<strong>in</strong>g <strong>in</strong> English versus German (Vasishth et al., 2008). Languagespecific<br />

effects usually exceed the scope <strong>of</strong> models that do not <strong>in</strong>corporate experience<br />

factors. MC02’s SRN model has proven to make consistent predictions concern<strong>in</strong>g global<br />

2


1.2 Relative Clauses and Complexity<br />

and <strong>in</strong>dividual differences <strong>in</strong> the comprehension <strong>of</strong> subject and object relative clauses.<br />

The aim <strong>of</strong> the work at hand is to assess this model’s predictions on two related phenomena<br />

that are most probably the result <strong>of</strong> language-specific experience: the subject/object<br />

difference <strong>in</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong> multiply embedded object<br />

relative clauses <strong>in</strong> English and German.<br />

S/O Difference <strong>in</strong> Mandar<strong>in</strong> Studies on Mandar<strong>in</strong> relative clauses are <strong>in</strong>conclusive<br />

regard<strong>in</strong>g the preferred extraction. While all other languages <strong>in</strong>vestigated <strong>in</strong> that matter<br />

show a subject preference, Mandar<strong>in</strong> is a potential candidate for an exception from that<br />

global consistency. This is also what is claimed to be predicted by MC02’s account for<br />

structural regularity (Hsiao and Gibson, 2003; Kuo and Vasishth, 2007).<br />

Forgett<strong>in</strong>g Effect The forgett<strong>in</strong>g effect refers to a grammaticality illusion <strong>of</strong> ungrammatical<br />

center-embedd<strong>in</strong>g that is present <strong>in</strong> English but not <strong>in</strong> German. Grammatical<br />

differences between German and English here also suggest an explanation <strong>based</strong> on experience<br />

with structural regularities.<br />

In the rema<strong>in</strong><strong>in</strong>g <strong>of</strong> this chapter the syntactic properties <strong>of</strong> relative clauses will briefly<br />

be <strong>in</strong>troduced. Then four relevant explanatory aspects <strong>of</strong> psychol<strong>in</strong>guistic models, will<br />

be discussed: memory, expectation, canonicity, and experience. Chapter 2 will lay out<br />

the two issues <strong>of</strong> the subject/object difference <strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong><br />

English and German, and discuss potential explanations. Chapter 3 will then expla<strong>in</strong> the<br />

properties <strong>of</strong> simple recurrent networks and discuss MC02’s account <strong>in</strong> detail. F<strong>in</strong>ally, <strong>in</strong><br />

chapter 4 SRNs will be used to simulate the two issues addressed here and the result<strong>in</strong>g<br />

predictions will be discussed.<br />

1.2 Relative Clauses and Complexity<br />

The focus <strong>of</strong> the work at hand are NP-modify<strong>in</strong>g restrictive relative clauses (RCs) like<br />

those shown <strong>in</strong> example (1). The embedded RC misses an NP, here represented by<br />

ei, which transformational syntax theories <strong>in</strong>terpret as an unpronounced trace <strong>of</strong> an<br />

extraction movement (e.g. Chomsky, 1981). The trace appears either <strong>in</strong> subject or<br />

object position <strong>in</strong> the embedded clause and is co-<strong>in</strong>dexed with the relative pronoun that,<br />

which b<strong>in</strong>ds it to the preced<strong>in</strong>g head noun. The position <strong>of</strong> the trace depends on the<br />

extraction type <strong>of</strong> the RC. In a subject-extracted relative clause (subject relative<br />

clause, subject relative, or SRC) like <strong>in</strong> (1a) the embedded subject reporter is extracted<br />

as the subject <strong>of</strong> the ma<strong>in</strong> clause. In an object-extracted relative clause (object<br />

relative clause, object relative, or ORC) like (1b) the extracted element served as object<br />

<strong>of</strong> the embedded clause and subject <strong>of</strong> the matrix clause. Hence, <strong>in</strong> the ORC the noun<br />

reporter fulfills two roles.<br />

3


Chapter 1 Prelim<strong>in</strong>aries<br />

(1) a. The reporter thati ei attacked the senator admitted the error. (SRC)<br />

b. The reporter thati the senator attacked ei admitted the error. (ORC)<br />

The example shows subject-modify<strong>in</strong>g relative clauses, i.e., they are attached to<br />

the subject-noun phrase <strong>of</strong> the matrix clause. RCs can just as well attach to the object.<br />

In the object-modify<strong>in</strong>g case the modified noun fulfills tow roles <strong>in</strong> the SRC<br />

(object <strong>in</strong> the ma<strong>in</strong> clause and subject <strong>in</strong> the RC) and not <strong>in</strong> the ORC. In language<br />

comprehension theories the embedded extraction traces are called gaps that have to<br />

be filled to reconstruct the underly<strong>in</strong>g argument structure. That <strong>in</strong>volves identify<strong>in</strong>g<br />

a filler (the head noun) and f<strong>in</strong>d<strong>in</strong>g the appropriate gap. Theories like the Active<br />

Filler Strategy (Frazier and Clifton, 1989; Frazier and Flores d’Arcais, 1989) or<br />

the top-down gap-search<strong>in</strong>g mechanism by L<strong>in</strong>, Fong, and Bever (2005) deal with this<br />

problem <strong>in</strong> different ways. Theories <strong>based</strong> on memory assume that unbounded dependencies<br />

like non-<strong>in</strong>tegrated arguments or fillers have to be stored <strong>in</strong> l<strong>in</strong>guistic work<strong>in</strong>g<br />

memory (WM) until the element is reached that is necessary to <strong>in</strong>tegrate the dependent<br />

<strong>in</strong>to the sentential structure (K<strong>in</strong>g and Just, 1991; Just and Carpenter, 1992; 2002; Gibson,<br />

1998; Lewis et al., 2006; Lewis and Vasishth, 2005; Vasishth and Lewis, 2006b). In<br />

example (1) the filler noun reporter must be held <strong>in</strong> memory until the verb attacked is<br />

reached, which signals the gap. In (1b) the distance between the filler reporter and its<br />

gap is relatively big compared to (1a). The distances between dependent elements can<br />

be further <strong>in</strong>creased by multiple RC embedd<strong>in</strong>g, which leads us to the third po<strong>in</strong>t <strong>of</strong><br />

<strong>in</strong>terest.<br />

The abovementioned properties make relative clauses especially <strong>in</strong>terest<strong>in</strong>g for psychol<strong>in</strong>guistic<br />

studies <strong>of</strong> memory processes and gap fill<strong>in</strong>g strategies. An additional po<strong>in</strong>t<br />

<strong>of</strong> <strong>in</strong>terest is recursion, which will be discussed now. There are crucial differences <strong>in</strong><br />

the complexity <strong>of</strong> embedded SRCs and ORCs. While object relatives <strong>in</strong> English can<br />

be recursively center-embedded <strong>in</strong> the ma<strong>in</strong> clause multiple embedd<strong>in</strong>g <strong>of</strong> subject relatives<br />

results <strong>in</strong> an iterative right-branch<strong>in</strong>g structure. In consequence ORC embedd<strong>in</strong>g<br />

causes longer distances between the dependents, illustrated <strong>in</strong> example (2a) and (2b) by<br />

co-<strong>in</strong>dexation.<br />

(2) Doubly embedded RCs:<br />

a. The reporter that1 e1 attacked the senator that2 e2 recognized the <strong>of</strong>ficer<br />

admitted the error. (SRC)<br />

b. The reporter that1 the senator that2 the <strong>of</strong>ficer recognized e2 attacked e1<br />

admitted the error. (ORC)<br />

Center-embedded dependencies as <strong>in</strong> example (2) result from the so-called mirror<br />

recursion, which can be characterized by the follow<strong>in</strong>g phrase structure rules:<br />

X → aXa; X → bXb; X → {} (1.1)<br />

The grammar <strong>in</strong> 1.1 generates <strong>in</strong>f<strong>in</strong>ite embedd<strong>in</strong>g <strong>in</strong> the form <strong>of</strong> abba. Allow<strong>in</strong>g for<br />

<strong>in</strong>f<strong>in</strong>ite recursion <strong>in</strong>troduces substantial formalistic complexities. Unlike right-branch<strong>in</strong>g<br />

4


1.2 Relative Clauses and Complexity<br />

as found <strong>in</strong> English SRCs center-embedd<strong>in</strong>g recursion exceeds f<strong>in</strong>ite-state expressibility.<br />

Mirror recursion is one <strong>of</strong> three basic recursion types def<strong>in</strong>ed by Chomsky (1957) as<br />

relevant for natural language. The second type is identity recursion, which produces<br />

cross-dependencies <strong>of</strong> the form abab as for example <strong>in</strong> Swiss-German or Dutch relatives<br />

clauses. Cross-dependencies are very rare but still problematic because they suggest that<br />

language is not context-free. A type even harder to f<strong>in</strong>d, if at all, is count<strong>in</strong>g recursion.<br />

It is characterized <strong>in</strong> terms <strong>of</strong> an artificial language like a n b n , where the occurrences<br />

<strong>of</strong> b depend on the number <strong>of</strong> occurrences <strong>of</strong> a. It is debatable whether this type <strong>of</strong><br />

recursion exists <strong>in</strong> natural language (cf. Christiansen and Chater, 1999) but this shall<br />

not be relevant <strong>in</strong> this thesis.<br />

Interest<strong>in</strong>gly human language performance seems to be unable to handle recursion beyond<br />

double-embedd<strong>in</strong>g. Double center-embedd<strong>in</strong>g already poses severe comprehension<br />

difficulties and is <strong>of</strong>ten rated as ungrammatical (Blumenthal, 1966). In addition some<br />

studies <strong>of</strong> center-embedd<strong>in</strong>g show a grammaticality illusion <strong>of</strong> ungrammatical structures<br />

(Frazier, 1985; Gibson and Thomas, 1999; Vasishth et al., 2008; Christiansen and Chater,<br />

1999). In spite <strong>of</strong> the fact that cross-dependency is formally most complex it is centerembedd<strong>in</strong>g<br />

which is the hardest recursion type for the human comprehender (Bach et al.,<br />

1986). The highly demand<strong>in</strong>g dependencies <strong>in</strong>volved <strong>in</strong> center-embedd<strong>in</strong>g and the potential<br />

ambiguities produced by multiple gap positions yield process<strong>in</strong>g difficulties that<br />

have been extensively <strong>in</strong>vestigated <strong>in</strong> psychol<strong>in</strong>guistic studies. The human difficulties<br />

<strong>in</strong> process<strong>in</strong>g recursive structures are also evident language-<strong>in</strong>dependently. For example<br />

center-embedd<strong>in</strong>g dependencies between digits and letters produce similar difficulties as<br />

<strong>in</strong> language process<strong>in</strong>g (Lark<strong>in</strong> and Burns, 1977). Current symbolic psychol<strong>in</strong>guistic theories<br />

close the obvious gap between grammatical competence and empirical performance<br />

by the <strong>in</strong>volvement <strong>of</strong> memory limitations, decay, and attention span or by explicitly<br />

def<strong>in</strong><strong>in</strong>g a limit to the number <strong>of</strong> recursion levels. However, a question that arises is<br />

whether recursion should be assumed <strong>in</strong> the human processor at all. S<strong>in</strong>ce competence<br />

cannot be directly assessed, it can be empirically accessed only through the l<strong>in</strong>k to performance.<br />

But the l<strong>in</strong>k <strong>in</strong> turn is dependent on the underly<strong>in</strong>g competence theory. This<br />

results <strong>in</strong> a non-falsifiability <strong>of</strong> an <strong>in</strong>f<strong>in</strong>ite competence, which Christiansen (1992) calls<br />

the Chomskian paradox.<br />

“In particular, I suggest that recursion is a conceptual artifact <strong>of</strong> the competence/performance<br />

dist<strong>in</strong>ction [...], <strong>in</strong>stead <strong>of</strong> a necessary characteristic <strong>of</strong><br />

the underly<strong>in</strong>g computational mechanism.” (Christiansen, 1992; p. 1)<br />

As will be shown <strong>in</strong> chapter 3 connectionist models are performance models, that<br />

account for memory limitations, recursion limits, and the characteristics <strong>of</strong> different<br />

recursion types concern<strong>in</strong>g human comprehension performance. The follow<strong>in</strong>g section<br />

<strong>in</strong>troduces four aspects <strong>of</strong> psychol<strong>in</strong>guistics relevant for this thesis.<br />

5


1.3 Psychol<strong>in</strong>guistic Aspects<br />

Chapter 1 Prelim<strong>in</strong>aries<br />

There are four major aspects considered <strong>in</strong> sentence process<strong>in</strong>g theories which are <strong>of</strong><br />

relevance for the work at hand: memory, expectation, canonicity, and experience. I will<br />

briefly present these and major related theories here.<br />

1.3.1 Memory<br />

Locality and Antilocality<br />

Complex sentences can conta<strong>in</strong> <strong>in</strong>creased distances between dependent constituents,<br />

which result <strong>in</strong> process<strong>in</strong>g difficulties. For example Grodner and Gibson (2005) found<br />

<strong>in</strong> a self-paced read<strong>in</strong>g experiment that an <strong>in</strong>creased distance between a verb and its<br />

subject correlates with <strong>in</strong>creased read<strong>in</strong>g times at the ma<strong>in</strong> verb (example 3).<br />

(3) a. The nurse supervised the adm<strong>in</strong>istrator while . . .<br />

b. The nurse from the cl<strong>in</strong>ic supervised the adm<strong>in</strong>istrator while . . .<br />

c. The nurse who was from the cl<strong>in</strong>ic supervised the adm<strong>in</strong>istrator while . . .<br />

The dependency relations <strong>in</strong> example (3) are illustrated by bold type face. In (3a) the<br />

dependent subject (the noun nurse) can be <strong>in</strong>tegrated with its head (the verb supervised)<br />

s<strong>in</strong>ce there is no material <strong>in</strong>terven<strong>in</strong>g before the verb. In examples (3b) and (3c) the<br />

noun-verb distance is <strong>in</strong>creas<strong>in</strong>g because other dependents or heads <strong>in</strong>tervene. This leads<br />

to <strong>in</strong>creased process<strong>in</strong>g time on the <strong>in</strong>tegration site, called a locality effect.<br />

Locality effects are most commonly attributed to the limited nature <strong>of</strong> l<strong>in</strong>guistic work<strong>in</strong>g<br />

memory, constra<strong>in</strong><strong>in</strong>g the process <strong>of</strong> <strong>in</strong>tegration. For b<strong>in</strong>d<strong>in</strong>g two dependent elements<br />

<strong>in</strong> the sentence structure the dependent has to be held <strong>in</strong> memory until its head is<br />

reached. Independently motivated properties <strong>of</strong> work<strong>in</strong>g memory are capacity limits,<br />

decay, and <strong>in</strong>terference. Capacity poses an upper limit to memory capacity usage and<br />

hence a limit to the number <strong>of</strong> elements held <strong>in</strong> memory. Furthermore, the representation<br />

<strong>of</strong> an item decays either as a function <strong>of</strong> time (Lewis and Vasishth, 2005) or <strong>of</strong><br />

the complexity <strong>of</strong> <strong>in</strong>terven<strong>in</strong>g material (Gibson, 1998; Just and Carpenter, 1992), which<br />

makes it harder to retrieve properly from memory. F<strong>in</strong>ally, memorized elements that are<br />

similar <strong>in</strong> certa<strong>in</strong> features can possibly be confused. For example, two nouns that are<br />

similar <strong>in</strong> animacy cause similarity-<strong>based</strong> <strong>in</strong>terference. The most documented<br />

effect attributed to similarity is retrieval <strong>in</strong>terference (Gordon et al., 2001; 2002; 2004;<br />

2006; Lewis and Vasishth, 2005; Dyke and Lewis, 2003; Dyke and McElree, 2006). Under<br />

this <strong>in</strong>terpretation the process <strong>of</strong> retriev<strong>in</strong>g a dependent noun at the head region<br />

is impaired because access is mediated by so-called retrieval cues that correspond to<br />

features-value pairs. Now, when several nouns share similar features they cannot be<br />

dist<strong>in</strong>guished by the retrieval cues. Encod<strong>in</strong>g a noun that shares features with an already<br />

encoded noun can be subject to encod<strong>in</strong>g <strong>in</strong>terference (Gordon et al., 2004). And<br />

6


1.3 Psychol<strong>in</strong>guistic Aspects<br />

f<strong>in</strong>ally similarity-<strong>based</strong> <strong>in</strong>terference is sometimes also said to affect process<strong>in</strong>g between<br />

encod<strong>in</strong>g and <strong>in</strong>tegration; this is called storage <strong>in</strong>terference (Lewis et al., 2006).<br />

In contrast to locality effects <strong>in</strong> English opposite effects have been found <strong>in</strong> head-f<strong>in</strong>al<br />

languages. For example, Konieczny (2000), Vasishth and Lewis (2006b), and Gibson<br />

et al. (2005b) found so-called antilocality effects <strong>in</strong> German, H<strong>in</strong>di, and Japanese, respectively.<br />

For example, Konieczny (2000) used German stimuli (4) that were comparable<br />

to those used by Grodner and Gibson (2005) <strong>in</strong> their English study.<br />

(4) a. Er hat den Abgeordneten begleitet, und . . .<br />

He has the delegate escorted, and . . .<br />

“He escorted the delegate, and . . . ”<br />

b. Er hat den Abgeordneten ans Rednerpult begleitet, und . . .<br />

He has the delegate to_the lectern escorted, and . . .<br />

“He escorted the delegate to the lectern, and . . . ”<br />

c. Er hat den Abgeordneten an das große Rednerpult begleitet, und . . .<br />

He has the delegate to the big lectern escorted, and . . .<br />

“He escorted the delegate to the large lectern, and . . . ”<br />

Read<strong>in</strong>g times at the ma<strong>in</strong> verb begleitet showed fastest read<strong>in</strong>g <strong>in</strong> (4c) and slowest <strong>in</strong><br />

(4a). That is contradict<strong>in</strong>g locality predictions. S<strong>in</strong>ce antilocality effects were first only<br />

discovered <strong>in</strong> head-f<strong>in</strong>al languages, it is commonly assumed that the language-specific<br />

word order regularities cause the divergent effects. Whereas theories <strong>based</strong> on <strong>in</strong>tegration<br />

cost (e.g. DLT: Gibson, 1998) cannot account for antilocality, expectation-<strong>based</strong><br />

theories (Hale, 2001; Levy, 2008) predict antilocality effects to be caused by an <strong>in</strong>creas<strong>in</strong>g<br />

expectation <strong>of</strong> the verb to appear while more <strong>in</strong>terven<strong>in</strong>g material is read. Notably,<br />

expectation theories predict antilocality language-<strong>in</strong>dependently. Their predictions receive<br />

support by recent evidence (Jaeger, Fedorenko, H<strong>of</strong>meister, and Gibson, 2008) for<br />

the presence <strong>of</strong> antilocality even <strong>in</strong> non-head-f<strong>in</strong>al languages (for an overview <strong>of</strong> possible<br />

explanations concern<strong>in</strong>g locality and antilocality see Vasishth, 2008).<br />

The Dependency Locality Theory<br />

Gibson (1998; 2000) formulated a theory <strong>of</strong> capacity and decay <strong>in</strong> work<strong>in</strong>g memory <strong>in</strong><br />

a discrete symbolic fashion. The cost predictions were <strong>based</strong> on dependency-<strong>in</strong>duced<br />

predictions <strong>of</strong> syntactic nodes and their distances, which is why the orig<strong>in</strong>al theory was<br />

called the Syntactic Prediction Locality Theory (SPLT). The later revised version<br />

is called the Dependency Locality Theory, referred to as DLT. The DLT<br />

assigns a memory cost to each word <strong>in</strong> a sentence on the basis <strong>of</strong> two discrete functions:<br />

Integration Cost and Storage Cost. Integration Cost directly accounts for locality<br />

effects by a discrete distance measure. It predicts the amount <strong>of</strong> process<strong>in</strong>g difficulty<br />

at the <strong>in</strong>tegration site and is def<strong>in</strong>ed by the number <strong>of</strong> discourse referents <strong>in</strong>terven<strong>in</strong>g<br />

between the dependent and its head. Valid discourse referents <strong>in</strong> Gibson’s sense are<br />

7


Chapter 1 Prelim<strong>in</strong>aries<br />

referrential constituents like nouns and ma<strong>in</strong> verbs as they refer to objects and events,<br />

respectively. Pronouns, however, do not <strong>in</strong>duce memory cost because they are assumed<br />

to be immediately accessible. The assumption beh<strong>in</strong>d Integration Cost is that every<br />

stored item receives an activation which decays depend<strong>in</strong>g on the number <strong>of</strong> newly encoded<br />

discourse referents while it is ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> memory. Integrat<strong>in</strong>g an element, i.e.,<br />

relat<strong>in</strong>g it to its head, needs more process<strong>in</strong>g effort when the element has less activation.<br />

Thus the <strong>in</strong>tegration cost is a function, monotonously <strong>in</strong>creas<strong>in</strong>g with the number <strong>of</strong><br />

<strong>in</strong>terven<strong>in</strong>g discourse referents. The cost accounts only implicitly for decay over time<br />

s<strong>in</strong>ce time is only represented discretely by successive discourse referents. The unit <strong>of</strong><br />

Integration Cost is energy units (EUs).<br />

The memory capacity limit is accounted for by the second pr<strong>in</strong>ciple <strong>of</strong> DLT: Storage<br />

Cost. It rests on the assumption that the parser constantly predicts the most probable<br />

complete sentence structure given the previous material and keeps it <strong>in</strong> memory.<br />

Structural complexity is calculated by the number <strong>of</strong> syntactic heads conta<strong>in</strong>ed. The<br />

more complex the predicted structure, the more syntactic heads it conta<strong>in</strong>s. Every predicted<br />

head uses up memory resources, so-called memory units (MUs). Memory load<br />

also affects process<strong>in</strong>g, because storage and process<strong>in</strong>g use the same resources (Just and<br />

Carpenter, 1992). Consequently, the more heads are predicted the higher the process<strong>in</strong>g<br />

cost. The important difference between the two costs is the location <strong>of</strong> their effects.<br />

While Integration Cost accounts for process<strong>in</strong>g differences only at the <strong>in</strong>tegration site,<br />

Storage Cost for a predicted structure affects process<strong>in</strong>g <strong>of</strong> every follow<strong>in</strong>g part <strong>of</strong> the<br />

sentence. Figure 1.1 shows the Integration Cost C(I) and the Storage Cost C(S) at<br />

each po<strong>in</strong>t <strong>in</strong> an English object relative clause. See<strong>in</strong>g the sentence-<strong>in</strong>itial determ<strong>in</strong>er<br />

ORC The reporter whoi the senator attacked ei admitted the error<br />

C(I) 0 0 0 0 0 1+2 3 0 0+1<br />

C(S) 2 1 3 4 3 1 1 1 0<br />

Total 2 1 3 4 3 3 4 1 1<br />

Figure 1.1: DLT cost metrics for an English ORC accord<strong>in</strong>g to Gibson (1998).<br />

<strong>in</strong>duces the prediction <strong>of</strong> a ma<strong>in</strong> clause. Hence predictions for an NP and a ma<strong>in</strong> verb<br />

have to be stored. Note that DLT considers the prediction <strong>of</strong> the ma<strong>in</strong> verb as cost-free,<br />

but <strong>in</strong> literature, it is mostly assigned a cost. For simplicity, <strong>in</strong> this work Storage Cost<br />

will be consistently assumed for the ma<strong>in</strong> verb. Hav<strong>in</strong>g completed the NP only the verb<br />

is predicted. At the relative pronoun who a Storage Cost <strong>of</strong> 3 is assigned because an<br />

embedded SRC is predicted, conta<strong>in</strong><strong>in</strong>g two heads: the embedded verb and a subject<br />

gap. See<strong>in</strong>g another determ<strong>in</strong>er changes the prediction <strong>in</strong>to an ORC, which conta<strong>in</strong>s<br />

one more head, namely the embedded subject. On senator only the embedded verb,<br />

the object gap, and the ma<strong>in</strong> verb stay predicted. On the embedded verb attacked then<br />

two <strong>in</strong>tegrations take place. The subject <strong>in</strong>tegration <strong>of</strong> attacked costs 1 EU because the<br />

8


1.3 Psychol<strong>in</strong>guistic Aspects<br />

verb counts as a new discourse referent. Establish<strong>in</strong>g the relation between the relative<br />

pronoun who and the empty element consumes 2 EU because two discourse referents<br />

(senator and attacked) have been processed meanwhile. The biggest cost is assigned to<br />

the ma<strong>in</strong> verb (admitted). Here the subject reporter is <strong>in</strong>tegrated after process<strong>in</strong>g three<br />

new discourse referents (the embedded subject, the embedded verb, and the ma<strong>in</strong> verb).<br />

Additionally an NP head is predicted because the verb is transitive. The last <strong>in</strong>tegration<br />

takes place at the last word error. The <strong>in</strong>tegration <strong>of</strong> the determ<strong>in</strong>er produces no<br />

cost, whereas build<strong>in</strong>g the structural relation with its head admitted consumes 1 EU.<br />

Altogether the total cost predicts the highest difficulty at the ma<strong>in</strong> verb, result<strong>in</strong>g from<br />

the long distance from its dependent arguments. Us<strong>in</strong>g the total cost as a read<strong>in</strong>g time<br />

predictor the DLT perfectly predicts locality effects.<br />

CC-READER<br />

A computational model <strong>in</strong> its basic assumptions consistent with the DLT cost association<br />

is the Capacity Constra<strong>in</strong>ed READER or CC-READER (Just and Carpenter, 1992;<br />

2002), the successor <strong>of</strong> READER, a sentence read<strong>in</strong>g model implemented <strong>in</strong> a framework<br />

called CAPS (Collaborative Activation-<strong>based</strong> Production System). CC-READER<br />

is an activation-<strong>based</strong> simulation <strong>of</strong> l<strong>in</strong>guistic work<strong>in</strong>g memory processes with a limited<br />

capacity as explanatory factor for memory load effects and <strong>in</strong>dividual differences. The<br />

constitut<strong>in</strong>g mechanism <strong>of</strong> CAPS is activation propagation caused by symbolic production<br />

rules. Both productions and stored elements (words, structures, propositions, etc.)<br />

use the same resources. The availability <strong>of</strong> elements as well as the applicability <strong>of</strong> productions<br />

is dependent on their received activation exceed<strong>in</strong>g a certa<strong>in</strong> threshold. The<br />

condition for a production rule is met when the activation threshold <strong>of</strong> the respective<br />

source element is reached. An important architectural property <strong>of</strong> CC-READER is the<br />

usage <strong>of</strong> process<strong>in</strong>g cyles. In each process<strong>in</strong>g cycle all currently applicable production<br />

rules fire simultaneously, mean<strong>in</strong>g they propagate weighted activation from the source to<br />

a “target element”. In this sense capacity is def<strong>in</strong>ed by the maximally available amount<br />

<strong>of</strong> activation that can be used by all productions and stored elements per process<strong>in</strong>g<br />

cycle. In case <strong>of</strong> activation shortage thresholds can be reached by <strong>in</strong>cremental production<br />

fir<strong>in</strong>g, result<strong>in</strong>g <strong>in</strong> more process<strong>in</strong>g cycles. This is what happens when the total<br />

activation has to be “scaled back” because it exceeds the capacity limit. A back-scal<strong>in</strong>g<br />

affects the activation <strong>of</strong> both productions and stored items: “Any shortfall <strong>of</strong> activation<br />

is assessed aga<strong>in</strong>st both storage and process<strong>in</strong>g, <strong>in</strong> proportion to the amount <strong>of</strong><br />

activation they are currently consum<strong>in</strong>g” (Just and Carpenter, 1992; p. 135). Read<strong>in</strong>g<br />

word-by-word, the pars<strong>in</strong>g process also depends on lexical access and constructs a<br />

representation <strong>of</strong> the sentence that <strong>in</strong>cludes thematic role <strong>in</strong>formation. Slow-downs <strong>in</strong><br />

process<strong>in</strong>g are represented by an <strong>in</strong>creased number <strong>of</strong> process<strong>in</strong>g cycles. The two concepts<br />

<strong>of</strong> activation propagation and process<strong>in</strong>g cycles enable CC-READER to predict a)<br />

read<strong>in</strong>g slow-downs <strong>in</strong> demand<strong>in</strong>g sentence regions (due to storage <strong>of</strong> many elements or<br />

fir<strong>in</strong>g <strong>of</strong> many productions) and b) region-specific <strong>in</strong>dividual differences <strong>in</strong> read<strong>in</strong>g; both<br />

9


Chapter 1 Prelim<strong>in</strong>aries<br />

empirically consistent, as comparisons with studies like K<strong>in</strong>g and Just (1991) show. All<br />

predicted process<strong>in</strong>g difficulties are expla<strong>in</strong>ed by demand <strong>of</strong> activation. This also covers<br />

<strong>in</strong>dividual differences by ascrib<strong>in</strong>g them to different limits <strong>of</strong> the total activation amount.<br />

The effect <strong>of</strong> decay is only <strong>in</strong>directly accounted for, depend<strong>in</strong>g on the number <strong>of</strong> newly<br />

activated elements, which is very similar to the DLT Integration Cost. The capacity<br />

limit causes newly needed activation to be preferably drawn from older elements. This<br />

results <strong>in</strong> a cont<strong>in</strong>uously graded decay, which, however, is not temporally dependent but<br />

rather depends on storage/process<strong>in</strong>g demands just like <strong>in</strong> the DLT.<br />

ACT-R <strong>Sentence</strong> Process<strong>in</strong>g Model<br />

Another computationally implemented sentence process<strong>in</strong>g model (Lewis and Vasishth,<br />

2005; Lewis et al., 2006; Vasishth and Lewis, 2006b) is built on the cognitive architecture<br />

ACT-R (Anderson and Lebiere, 1998; Anderson et al., 2004). Similar to the CAPS<br />

model ACT-R works on a sub-symbolic activation propagation basis. Rule application,<br />

however, happens on a more symbolic-fashioned condition-action relation. Process<strong>in</strong>g<br />

difficulties are predom<strong>in</strong>antly retrieval-<strong>based</strong>. Elements (memory chunks) like lexical<br />

entries <strong>in</strong>volved <strong>in</strong> a production need to be retrieved from declarative memory. The<br />

success <strong>of</strong> the retrieval process depends on the chunk’s current activation level and its<br />

match<strong>in</strong>g <strong>of</strong> the retrieval cues specified <strong>in</strong> the production condition. Retrieval cues<br />

are feature value pairs that <strong>in</strong>crease the activation <strong>of</strong> chunks depend<strong>in</strong>g on the number <strong>of</strong><br />

matched features (associative activation). The total activation <strong>of</strong> a memory chunk<br />

calculated from the activation level and cue-<strong>based</strong> activation determ<strong>in</strong>es the probability<br />

<strong>of</strong> be<strong>in</strong>g retrieved and its retrieval latency. The possibility <strong>of</strong> several chunks match<strong>in</strong>g<br />

the retrieval cues partially enables the model to simulate associative retrieval<br />

<strong>in</strong>terference. Retrieval <strong>in</strong>terference causes distribution <strong>of</strong> associative activation onto<br />

several lexical entries, caus<strong>in</strong>g latencies and potentially the retrieval <strong>of</strong> the wrong chunk.<br />

How severely <strong>in</strong>terference affects retrieval depends on the beforementioned activation<br />

level. Activation is a fluctuat<strong>in</strong>g value which is a function <strong>of</strong> usage and decay over time.<br />

Cue-<strong>based</strong> activation and retrieval <strong>of</strong> a particular element cause its reactivation that<br />

slows down the decay process. The pars<strong>in</strong>g process <strong>of</strong> the ACT-R sentence process<strong>in</strong>g<br />

model (Lewis and Vasishth, 2005) is a comb<strong>in</strong>ation <strong>of</strong> a left corner <strong>in</strong>cremental structure<br />

build<strong>in</strong>g mechanism and a top-down goal-guided syntactic expectation that specifies the<br />

phrasal category <strong>of</strong> the structure to be constructed. A very unconventional assumption<br />

<strong>of</strong> the model regard<strong>in</strong>g the pars<strong>in</strong>g process is that, <strong>in</strong> spite <strong>of</strong> an <strong>in</strong>cremental pars<strong>in</strong>g<br />

process, the memory representation does not conta<strong>in</strong> serial order <strong>in</strong>formation that could<br />

guide retrieval and attachment preferences. Recency is only implicitly accounted for<br />

by the decay function that affects pars<strong>in</strong>g decisions <strong>in</strong> addition to cue-match<strong>in</strong>g. What<br />

differentiates this model from CC-READER and DLT is the account for <strong>in</strong>terference<br />

effects and a temporal decay function. Furthermore, process<strong>in</strong>g difficulty is not represented<br />

by process<strong>in</strong>g cycles but directly by estimated process<strong>in</strong>g time. Sett<strong>in</strong>g retrieval<br />

cues, structural attachment, and shift<strong>in</strong>g attention to the next word have fixed time<br />

10


1.3 Psychol<strong>in</strong>guistic Aspects<br />

values that <strong>in</strong> comb<strong>in</strong>ation with the activation- and <strong>in</strong>terference-<strong>based</strong> retrieval latency<br />

constitute the predicted read<strong>in</strong>g time for a word. Thus, while CC-READER and DLT<br />

make predictions <strong>based</strong> on resource management, the predictions by the ACT-R model<br />

are <strong>based</strong> on language-<strong>in</strong>dependent psychologically motivated latencies. The ACT-R<br />

sentence process<strong>in</strong>g model has shown considerable consistency with empirical data regard<strong>in</strong>g<br />

ambiguity resolution, reanalysis, center-embedd<strong>in</strong>g complexity, and extraction<br />

preferences <strong>in</strong> relative clauses (Lewis and Vasishth, 2005). Additionally, the cue-<strong>based</strong><br />

reactivation mechanism accounts for antilocality effects <strong>in</strong> certa<strong>in</strong> contexts. Additional<br />

material conta<strong>in</strong><strong>in</strong>g pronom<strong>in</strong>als or other expressions referr<strong>in</strong>g to the previously mentioned<br />

dependent can reactivate the chunk represent<strong>in</strong>g the dependent and thus boost<br />

its activation, mak<strong>in</strong>g it faster retrievable on the head.<br />

1.3.2 Expectation<br />

A very different approach from serial memory-<strong>based</strong> resource management theories is<br />

to ascribe read<strong>in</strong>g time effects to the context-dependent plausibility <strong>of</strong> the evolv<strong>in</strong>g<br />

structure. For example, predictability <strong>of</strong> a certa<strong>in</strong> word <strong>in</strong> a given context, empirically<br />

quantified by the Cloze completion task (Taylor, 1953), has a considerable effect on<br />

eyetrack<strong>in</strong>g and ERP measures <strong>in</strong> sentence process<strong>in</strong>g (Ehrlich and Rayner, 1981). A<br />

theoretically related measure is surprisal (Hale, 2001). Surprisal as used <strong>in</strong> Levy (2008)<br />

is a probabilistic grammar-<strong>based</strong> approximation to the negative log-Cloze probability<br />

but <strong>in</strong> fact results <strong>in</strong> better data fit due to its logarithmic scal<strong>in</strong>g <strong>of</strong> effects. Levy (2008)<br />

proposes a theory <strong>of</strong> a probabilistic ambiguity resolution by parallel plausibility rank<strong>in</strong>g<br />

<strong>of</strong> possible structures. For a partial str<strong>in</strong>g all complete structures that <strong>in</strong>clude the <strong>in</strong>put<br />

seen so far as a prefix are considered. At every word wi a probability distribution P i T<br />

is assigned over all possible cont<strong>in</strong>uations T , rank<strong>in</strong>g most probable structures highest.<br />

Follow<strong>in</strong>g Levy, the probability distribution can most straight-forwardly be <strong>based</strong> on<br />

a probabilistic grammar extracted from annotated corpora; but he does not make any<br />

obligatory commitments with respect to that because the source <strong>of</strong> the probability /<br />

plausibility distribution can theoretically be <strong>of</strong> semantical or phonological nature or the<br />

like. S<strong>in</strong>ce <strong>in</strong> <strong>in</strong>cremental pars<strong>in</strong>g the predictions change over time, the distribution P i T<br />

has to be updated with every new <strong>in</strong>put word. A possible re-rank<strong>in</strong>g <strong>of</strong> the preferences<br />

dur<strong>in</strong>g the update process due to an unexpected word is regarded by Levy as a k<strong>in</strong>d<br />

<strong>of</strong> reanalysis, <strong>in</strong>duc<strong>in</strong>g difficulty. In this sense the concept <strong>of</strong> difficulty prediction is<br />

equal to surprisal. However, Levy def<strong>in</strong>es difficulty by the relative entropy between the<br />

two probability distributions before and after the update. Consequently, the more the<br />

re-ranked distribution differs from the orig<strong>in</strong>al the higher is the process<strong>in</strong>g cost. This is<br />

related to reanalysis as <strong>in</strong> Frazier et al. (1983) and other approaches, with the difference<br />

that the expectation-<strong>based</strong> theory by Levy (2008) is not serial but assumes a parallel<br />

ma<strong>in</strong>tenance <strong>of</strong> all possible (or at least the most probable) structures.<br />

In contrast to connectionist prediction models, <strong>in</strong> which the performance is dependent<br />

upon how well the network has extracted probabilistic constra<strong>in</strong>ts froms the <strong>in</strong>put mate-<br />

11


Chapter 1 Prelim<strong>in</strong>aries<br />

rial, <strong>in</strong> Levy’s approach the structural probabilities <strong>of</strong> the grammar are perfectly known<br />

to the parser. Consequently, the parallel probabilistic resource allocation theory by Levy<br />

constitutes as sort <strong>of</strong> competence model with surprisal or relative entropy as a “bottleneck”<br />

to comprehension, thus yield<strong>in</strong>g performance-related predictions. Predictions <strong>of</strong><br />

frequency-<strong>based</strong> approaches like the tun<strong>in</strong>g hypothesis (Mitchell, Cuetos, Corley, and<br />

Brysbaert, 1995) are quite similar to surprisal most <strong>of</strong> the time but differ fundamentally<br />

<strong>in</strong> head-f<strong>in</strong>al structures. Similarly DLT and surprisal make comparable predictions only<br />

<strong>in</strong> structures that are not head-f<strong>in</strong>al. In head-f<strong>in</strong>al constructions the preced<strong>in</strong>g dependents<br />

provide statistical <strong>in</strong>formation about the nature <strong>of</strong> the head, thus narrow<strong>in</strong>g the<br />

prediction. Follow<strong>in</strong>g the theory, a better prediction (or lower surprisal) facilitates <strong>in</strong>tegration<br />

on the head. Thus an expectation-<strong>based</strong> theory predicts language-<strong>in</strong>dependent<br />

anti-locality effects <strong>in</strong> head-f<strong>in</strong>al structures.<br />

1.3.3 Canonicity<br />

In literature the term canonicity with respect to word order is <strong>of</strong>ten used as synonymous<br />

to regularity and structural frequency. Here these terms shall be dist<strong>in</strong>guished <strong>in</strong> order<br />

to clearly formulate respective theories.<br />

A theory <strong>of</strong> canonicity has to answer two questions:<br />

1. What categorial doma<strong>in</strong> is the focus <strong>of</strong> the canonicity?<br />

2. What makes specific structures canonical?<br />

The categorial focus <strong>of</strong> canonicity can be grammatical functions, thematic roles, letter<br />

sequences, prosody and the like. The specific structures count<strong>in</strong>g as canonical <strong>in</strong> these<br />

doma<strong>in</strong>s can be chosen by structural regularity, complexity, or simply by convention.<br />

The most common canonicity account goes back to Greenberg (1963) and relates to<br />

the basic grammatical functions subject, object, and predicate and is justified by structural<br />

regularities. Greenberg classified languages <strong>in</strong> terms <strong>of</strong> their canonical word order.<br />

He and subsequent literature count English as a subject-verb-object (SVO) language because<br />

simple sentences and most subord<strong>in</strong>ate constructions follow that order. Therewith<br />

it belongs to the second biggest class (41.79%) preceded by the SOV order attributed<br />

to 44.78% (accord<strong>in</strong>g to Toml<strong>in</strong>, 1986) <strong>of</strong> the languages <strong>of</strong> the world. However, the<br />

classification is not as clear for all languages. German is arguably an SOV language<br />

although the simplest sentence structure is built with an SVO order like English. For<br />

example, Erdmann (1990) concludes that German does not fulfill all requirements for<br />

an SOV language and should therefore be categorized as SVO.<br />

As mentioned above, structural regularity <strong>based</strong> on corpus occurences is not the only<br />

possibility to ground a canonicity account on. A generative grammar-<strong>based</strong> account<br />

that relates word order canonicity to language process<strong>in</strong>g assumes the language-specific<br />

canonical structure as an <strong>in</strong>ternal representation underly<strong>in</strong>g the surface structure (L<strong>in</strong><br />

et al., 2005). Thus, <strong>in</strong> order to comprehend a non-canonically structured sentence the<br />

12


1.3 Psychol<strong>in</strong>guistic Aspects<br />

parser has to re-transform it <strong>in</strong>to the canonical order. This extra process<strong>in</strong>g makes<br />

non-canonical structures harder to comprehend than sentences mirror<strong>in</strong>g the underly<strong>in</strong>g<br />

order. Supported by evidence from Ferreira (2003), this theory extends to thematic roles,<br />

which seem to be assigned by us<strong>in</strong>g heuristics <strong>based</strong> on a canonical argument structure.<br />

A fully heuristic language process<strong>in</strong>g account has been proposed by Bever (1970). He<br />

def<strong>in</strong>ed several comprehension strategies that relied on superficial structural similarities.<br />

One <strong>of</strong> them (Strategy D) was related to thematic role assignment:<br />

Strategy D “Any Noun-Verb-Noun (NVN) sequence with<strong>in</strong> a potential <strong>in</strong>ternal unit <strong>in</strong><br />

the surface structure corresponds to ‘actor-action-object’.” (Bever, 1970)<br />

Bever’s strategies cover several categorial levels <strong>of</strong> structural regularities reach<strong>in</strong>g from<br />

phonemes to complex phrases and clauses. These strategies def<strong>in</strong>e specific structures that<br />

seem to appear <strong>in</strong> a regular fashion at different levels as basic or canonical structures<br />

that are predicted or expected by default. Concern<strong>in</strong>g process<strong>in</strong>g difficulty, this account<br />

would predict harder process<strong>in</strong>g for structures that do not fit <strong>in</strong>to the strategy templates.<br />

Canonicity accounts are closely related to structural frequency. Although it is not obligatory,<br />

most accounts that use heuristics base their choice <strong>of</strong> canonical structures on the<br />

frequencies <strong>of</strong> those <strong>in</strong> language usage. The orig<strong>in</strong>al language classification by Greenberg<br />

<strong>in</strong>to SOV, SVO, etc., was clearly attributed to <strong>of</strong>ten occurr<strong>in</strong>g thematic order<strong>in</strong>gs.<br />

On the other hand, a theory assum<strong>in</strong>g base-generated order<strong>in</strong>gs <strong>in</strong> deep structure <strong>of</strong> a<br />

generative grammar can, <strong>of</strong> course, argue completely <strong>in</strong>dependent <strong>of</strong> frequencies and, for<br />

<strong>in</strong>stance, refer to universal grammar specifications. Of course, Bever’s strategic preferences,<br />

although apparently correlat<strong>in</strong>g with frequent structures, could also be claimed<br />

to stem from <strong>in</strong>nate universal pr<strong>in</strong>ciples that help children learn a language. Detailed<br />

predictions <strong>of</strong> a word order canonicity account will be discussed <strong>in</strong> 2.3.<br />

1.3.4 <strong>Experience</strong><br />

<strong>Experience</strong>-<strong>based</strong> theories assume that pars<strong>in</strong>g strategies and process<strong>in</strong>g efficiency are<br />

shaped by exposure to language.<br />

Structural Frequency<br />

Literature shows that corpus frequencies can be a good predictor for comprehension difficulty.<br />

A consequent assumption is that structures that are used more <strong>of</strong>ten should be<br />

easier to comprehend than structures that are rarely produced. This assumption proposes<br />

a parallelism between language production and comprehension. There are roughly<br />

two possibilities to expla<strong>in</strong> that parallelism. One explanation states a casual relation<br />

between production and comprehension, mean<strong>in</strong>g that exposure to special structures<br />

shapes the comprehension ability <strong>of</strong> those. The other explanation assumes that the underly<strong>in</strong>g<br />

processes <strong>of</strong> production and comprehension are basically the same and hence<br />

13


Chapter 1 Prelim<strong>in</strong>aries<br />

are limited by the same constra<strong>in</strong>ts. An experience-<strong>based</strong> account clearly favors the<br />

former explanation, which, <strong>of</strong> course, does not exclude the second possibility but does<br />

not depend on it.<br />

The Structural Gra<strong>in</strong> Size<br />

A serious problem for the formulation <strong>of</strong> symbolic theories <strong>of</strong> handl<strong>in</strong>g exposure-<strong>based</strong><br />

pars<strong>in</strong>g decisions is the question <strong>of</strong> gra<strong>in</strong> size. As <strong>in</strong> canonicity accounts the question to<br />

answer is on which structural level <strong>in</strong>formation should be considered as affect<strong>in</strong>g pars<strong>in</strong>g<br />

decisions. A symbolic exposure-<strong>based</strong> account like the one by Mitchell, Cuetos, Corley,<br />

and Brysbaert (1995) tabulates the frequencies <strong>of</strong> specific structures. For each relevant<br />

structure there is a table list<strong>in</strong>g its different <strong>in</strong>terpretations (e.g. attachment site<br />

<strong>in</strong> complex noun phrases). When the parser processes an ambiguous construction the<br />

most frequent <strong>of</strong> the relevant recorded structures (frames or partial syntactic representations)<br />

is merged with the current sentence structure to yield a predicted disambiguated<br />

structure.<br />

“The success <strong>of</strong> this process depends upon establish<strong>in</strong>g a useful l<strong>in</strong>k between<br />

aspects <strong>of</strong> the current material and correspond<strong>in</strong>g features <strong>of</strong> the established<br />

records. This is essentially a category selection or pattern-match<strong>in</strong>g<br />

problem.” (p. 470).<br />

The recorded structures can be specified <strong>in</strong> deep detail, say, on a lexical level, or rather<br />

abstracted, e.g., on the level <strong>of</strong> phrasal categories. Example (5) (Mitchell et al., 1995)<br />

conta<strong>in</strong>s a global ambiguity. In (5a) the RC who was outside the house can be attached<br />

to either the first noun wife or the second noun football star. The same is true for the<br />

PP outside the house <strong>in</strong> (5b).<br />

(5) a. Someone stabbed the wife <strong>of</strong> the football star who was outside the house.<br />

b. Someone stabbed the estranged wife <strong>of</strong> the movie star outside the house.<br />

In an exposure-<strong>based</strong> account the parser’s decision about noun one (high) or noun two<br />

(low) attachment depends on the corpus frequencies <strong>of</strong> both possibilities. These frequencies<br />

could be calculated on several possible structural levels. For example frequencies<br />

could be tabulated <strong>in</strong>dividually for each construction by tabulat<strong>in</strong>g attachment preferences<br />

for NP-PP-RC structures as well as for NP-PP-PP structures. Also, the preferences<br />

could be tabulated for both constructions pooled together by record<strong>in</strong>g the occurrences <strong>of</strong><br />

the more abstracted NP-PP-(modify<strong>in</strong>g constituent) structure. The choice <strong>of</strong> gra<strong>in</strong> size<br />

crucially affects the theory’s predictions. A too f<strong>in</strong>e gra<strong>in</strong>ed record level is <strong>in</strong> danger <strong>of</strong><br />

miss<strong>in</strong>g out some affected constructions. A very abstract level on the other hand can lead<br />

to overgeneralization. Mitchel, Cuetos et al. categorize exist<strong>in</strong>g exposure-<strong>based</strong> models<br />

<strong>in</strong>to a) f<strong>in</strong>e-gra<strong>in</strong>ed (Spivey-Knowlton, 1994), b) coarse-gra<strong>in</strong>ed (Cuetos et al.,<br />

1996), and c) mixed-gra<strong>in</strong> models. <strong>Connectionist</strong> network models like MacDonald<br />

14


1.3 Psychol<strong>in</strong>guistic Aspects<br />

et al. (1994) and Juliano and Tanenhaus (1994) are counted <strong>in</strong>to the third category. Also<br />

the account <strong>of</strong> Bever (1970) is a mixed-gra<strong>in</strong> account. Mitchell et al. basically argue for<br />

a coarse gra<strong>in</strong>ed approach. They present empirical evidence contra sub-classification <strong>of</strong><br />

structures depend<strong>in</strong>g upon noun types or animacy or the like. For example <strong>in</strong> French<br />

the statistics <strong>of</strong> all NP-PP-RC structures make the correct noun phrase attachment<br />

preference predictions (high preference). Includ<strong>in</strong>g statistical <strong>in</strong>formation about def<strong>in</strong>iteness<br />

and other aspects <strong>of</strong> noun phrases leads to the wrong predictions. Obviously,<br />

f<strong>in</strong>e-gra<strong>in</strong>ed <strong>in</strong>formation has to be ignored <strong>in</strong> that case. In a sentence completion study<br />

Corley and Corley (1995) showed evidence that noun phrase attachment preferences <strong>in</strong><br />

English do not rely on lexical data. They analyzed the by-subject variance <strong>of</strong> two studies<br />

<strong>in</strong>volv<strong>in</strong>g the same structures but different lexical items. The lexical alternation did not<br />

affect the attachment preference (low). Interest<strong>in</strong>gly, for noun phrase attachments with<br />

two potential attachment sites there is a high-attachment preference <strong>in</strong> most languages.<br />

Exceptions are English, German, Italian, and Swedish where low attachment is generally<br />

preferred. Mitchell et al. (1995) argue that such general preferences are only expla<strong>in</strong>able<br />

<strong>in</strong> coarse-gra<strong>in</strong>ed models. Cuetos et al. (1996) report that corpus frequencies predict<br />

the attachment preferences <strong>of</strong> two-NP-site ambiguities 1 for Spanish and English. For<br />

Spanish, which shows a high-attachment preference, Cuetos and colleagues found that<br />

60% <strong>of</strong> the NP-PP-RC constructions <strong>in</strong> the corpus had the RC attached to the NP.<br />

For English (low-attachment preference) they only found 38% high attachment <strong>in</strong> the<br />

corpus. Desmet and Gibson (2003) argue aga<strong>in</strong>st the commitment <strong>of</strong> a model to one<br />

gra<strong>in</strong> size level by show<strong>in</strong>g that <strong>in</strong> some cases NP attachment preferences are affected<br />

on the lexical level. Desmet and Gibson studied human preferences <strong>in</strong> eyetrack<strong>in</strong>g and<br />

corpus frequencies <strong>of</strong> three-NP-site noun conjunction ambiguities (see example 6). They<br />

found that corpus frequencies support the empirical results, which show a preference <strong>of</strong><br />

middle over high attachment. But replac<strong>in</strong>g the noun <strong>in</strong>side the attached phrase with<br />

the pronoun one turned the preference <strong>in</strong>to high > middle rather than middle > high<br />

(Gibson and Schütze, 1999).<br />

(6) A column about a soccer team from the suburbs and. . .<br />

a. an article/one about a baseball team from the city were published <strong>in</strong> the<br />

Sunday edition. (high)<br />

b. a baseball team/one from the city was published <strong>in</strong> the Sunday edition.<br />

(middle)<br />

A similar effect was ga<strong>in</strong>ed for German two-NP-site ambiguities (Hemforth et al.,<br />

2000): The ambiguity conta<strong>in</strong><strong>in</strong>g an anaphoric b<strong>in</strong>d<strong>in</strong>g (e.g. a relative clause) produces<br />

a high attachment preference, while without the anaphoric b<strong>in</strong>d<strong>in</strong>g it results <strong>in</strong> a low<br />

attachment preference. This leads Desmet and Gibson (2003) to conclude that <strong>in</strong> addition<br />

to structural <strong>in</strong>formation the occurrences <strong>of</strong> pronouns have to be tabulated as a<br />

1 Two-NP-site ambiguities refer to constructions where there are two preced<strong>in</strong>g NPs to potentially<br />

attach to.<br />

15


Chapter 1 Prelim<strong>in</strong>aries<br />

predictor. This shows – as Mitchell et al. (1995) also admit – that the exposure-<strong>based</strong><br />

approach has to f<strong>in</strong>d a balance between coarse and more f<strong>in</strong>e gra<strong>in</strong>ed measures and that<br />

different structures might require different gra<strong>in</strong> sizes for the tabulation <strong>of</strong> frequencies.<br />

Structural Frequency <strong>in</strong> a <strong>Connectionist</strong> Network<br />

A type <strong>of</strong> models that specifically base their predictions on the records <strong>of</strong> structural<br />

frequencies are connectionist network models. Mitchell et al. (1995) say that “<strong>in</strong> a<br />

connectionist system the records would consist <strong>of</strong> a set <strong>of</strong> activation weights distributed<br />

throughout the network.” (p. 472). This is partly true. The network does not explicitly<br />

count frequencies, nor are frequencies stored anywhere <strong>in</strong> the network. It is rather that<br />

every exposure <strong>of</strong> the network to a specific structure immediately changes the weight<br />

distributions and thus the whole behavior <strong>of</strong> the network. So, one could say that the<br />

weight distributions conta<strong>in</strong> implicit structural knowledge. This can be observed, for<br />

example, <strong>in</strong> the activations <strong>in</strong> hidden layers. Inputs <strong>of</strong> similar structures result <strong>in</strong> similar<br />

activation patterns <strong>in</strong> the hidden layer <strong>of</strong> simple recurrent networks (SRN, Elman, 1990).<br />

The comparison <strong>of</strong> these patterns reveals the structural generalization levels that drive<br />

the networks predictions. Different from symbolic exposure-<strong>based</strong> accounts there is no<br />

explicitly fixed structural gra<strong>in</strong> size the network is sensitive to. There is <strong>of</strong> course a<br />

limit to the lowest gra<strong>in</strong> size, which is def<strong>in</strong>ed by the encod<strong>in</strong>g level <strong>of</strong> the <strong>in</strong>put. If<br />

the <strong>in</strong>put str<strong>in</strong>g is encoded on word level, the network has no <strong>in</strong>formation below that<br />

level to work with. The upper limit depends on the networks architecture and can be<br />

affected by the size <strong>of</strong> the hidden layer, the learn<strong>in</strong>g mechanism, and specifically for<br />

recurrent networks by their “memory span”. What levels the network actually choses<br />

is hard to say <strong>in</strong> advance. Learn<strong>in</strong>g is a walk through a state space try<strong>in</strong>g to f<strong>in</strong>d the<br />

optimal solution to the desired <strong>in</strong>put-output pair<strong>in</strong>g. The choice <strong>of</strong> gra<strong>in</strong> size is part<br />

<strong>of</strong> that optimization process and can change dur<strong>in</strong>g the learn<strong>in</strong>g phase. A commitment<br />

to a specific gra<strong>in</strong> size implicitly <strong>in</strong>volves a commitment to the number <strong>of</strong> structures to<br />

dist<strong>in</strong>guish. A f<strong>in</strong>e-gra<strong>in</strong>ed model will consequently have to keep apart lots <strong>of</strong> structural<br />

representations, while a very coarse-gra<strong>in</strong>ed model has only few structures to deal with.<br />

This relation means that a network with very few hidden nodes where the <strong>in</strong>formation<br />

has to be passed through will only be able to do very high level generalizations. The<br />

f<strong>in</strong>al choice <strong>of</strong> gra<strong>in</strong> size will ideally be the most useful structural level <strong>of</strong> the <strong>in</strong>ternal<br />

<strong>in</strong>put representation to meet the output requirements given the networks architecture.<br />

In chapter 3 the properties <strong>of</strong> connectionist networks will be discussed <strong>in</strong> more detail.<br />

Frequency and Regularity<br />

Structural regularity is the occurrence <strong>of</strong> similarities between different structures on a<br />

certa<strong>in</strong> gra<strong>in</strong> level. For example, the English SRC is more regular than the ORC because<br />

on the level <strong>of</strong> functional categories (SVO) the SRC is similar to many other structures.<br />

In contrast, the corpus frequency <strong>of</strong> OSV as <strong>in</strong> the ORC is very low. In that sense,<br />

16


1.3 Psychol<strong>in</strong>guistic Aspects<br />

regularity is just noth<strong>in</strong>g else than frequency. One can speak <strong>of</strong> a regularity effect when,<br />

for <strong>in</strong>stance, structures or tokens that are not highly-frequent themselves receive a sort <strong>of</strong><br />

“neighbor benefit” from frequent structures that are similar on a certa<strong>in</strong> level. Benefit is<br />

meant <strong>in</strong> the context <strong>of</strong> a facilitat<strong>in</strong>g frequency effect. For example, <strong>in</strong> word-recognition<br />

regular words (with respect to orthography-pronounciation correspondence) are easier<br />

to recognize than exceptional words, although a regular word is not necessarily more<br />

frequent. It just shares sub-regulatities (i.e. similarities on a lower level) with others,<br />

which has a facilitat<strong>in</strong>g effect. S<strong>in</strong>ce irregular words do not have a “neighbor benefit”, this<br />

leads to the frequency × regularity <strong>in</strong>teraction, which is implemented, for example, <strong>in</strong><br />

Seidenberg and McClelland (1989). The <strong>in</strong>teraction characterizes the fact that there is a<br />

recognition performance difference between high- and low-frequent irregular words, while<br />

this difference is not present for regular words. Seidenberg and McClelland’s model is a<br />

connectionist architecture, which predicts that <strong>in</strong>teraction due to its learn<strong>in</strong>g mechanism.<br />

It is the same <strong>in</strong>teraction that the SRN <strong>in</strong> MacDonald and Christiansen (2002) predicts<br />

for English subject and object relative clauses, regard<strong>in</strong>g ORCs as irregular.<br />

17


Chapter 2<br />

Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

2.1 The Subject/Object Difference<br />

One <strong>of</strong> the issues that shall be addressed <strong>in</strong> this work is the process<strong>in</strong>g difference between<br />

subject and object relatives. The subject/object difference is phenomenon excessively<br />

discussed <strong>in</strong> the literature. Studies <strong>in</strong> many languages show that subject relatives are<br />

easier to comprehend than object relatives (see table 2.1 for an overview). The studies<br />

are cross-l<strong>in</strong>guistically consistent enough to speak <strong>of</strong> a universal subject preference.<br />

An exemplary study <strong>of</strong> English RCs that will be <strong>of</strong> further relevance for the work at<br />

hand is K<strong>in</strong>g and Just (1991). For that reason I will briefly describe their experiment:<br />

K<strong>in</strong>g and Just (1991) K<strong>in</strong>g and Just conducted a self-paced read<strong>in</strong>g 1 study <strong>of</strong> English<br />

RCs like <strong>in</strong> example (1), repeated here as (7). Before the experiment the participants<br />

were grouped by their read<strong>in</strong>g span value obta<strong>in</strong>ed by a read<strong>in</strong>g span test<br />

(Daneman and Carpenter, 1980). The span value is assumed to be associated with the<br />

<strong>in</strong>dividual memory capacity. The test value was used to group participants <strong>in</strong>to high-,<br />

mid-, and low-span readers. Read<strong>in</strong>g time analysis yielded the follow<strong>in</strong>g results: there<br />

was a) a global memory span effect show<strong>in</strong>g <strong>in</strong>creased read<strong>in</strong>g times for participants<br />

with lower span value, b) the memory span effect was <strong>in</strong>creased on the ORC, c) regions<br />

<strong>of</strong> greatest difficulty were the embedded verb (attacked) and the ma<strong>in</strong> verb (admitted),<br />

and d) the ORC was globally read slower than the SRC. The results showed that readers<br />

spent more time on the embedded and the ma<strong>in</strong> verb <strong>in</strong> object relative clauses compared<br />

to subject relative clauses. Additional comprehension questions yielded a significantly<br />

lower accuracy for low-spans compared to high-spans, show<strong>in</strong>g that not only process<strong>in</strong>g<br />

was slower but also comprehension was worse for participants with lower span value.<br />

Note that the high extraction type difference on the ma<strong>in</strong> verb may be a spillover effect.<br />

Grodner and Gibson (2005) carried out a study that used stimuli with <strong>in</strong>terven<strong>in</strong>g<br />

material between the verbs to prevent spillover. This study showed that there is <strong>in</strong>deed<br />

1 Self-paced read<strong>in</strong>g (Just et al., 1982) is a method to record word-by-word read<strong>in</strong>g time <strong>in</strong> onl<strong>in</strong>e<br />

sentence comprehension. Participants read a sentences word-by-word press<strong>in</strong>g a button to make the<br />

next word appear. Only the current word is shown. The rest <strong>of</strong> the sentence may optionally be<br />

represented by mask<strong>in</strong>g characters.<br />

18


2.1 The Subject/Object Difference<br />

130 MARCEL ADAM JUST AND PATRICIA A. CARPENTER<br />

I<br />

Q<br />

O<br />

W<br />

S<br />

H<br />

O 25<br />

S<br />

H<br />

05<br />

900<br />

800<br />

700<br />

600<br />

500<br />

Mid<br />

High<br />

Subject Relative<br />

Low<br />

I I I<br />

[The] reporter senator admitted the<br />

that<br />

attacked<br />

the<br />

error,<br />

Object Relative<br />

J_ I I I<br />

[The] reporter attacked admitted the<br />

that error,<br />

the<br />

senator<br />

Figure 2. Read<strong>in</strong>g time per word for successive areas <strong>of</strong> subject- and object-relative sentences, for high,<br />

medium (Mid), and low span subjects. (The differences among the span groups are larger for the more<br />

difficult object-relative construction, which is the more complex sentence. The differences are particularly<br />

large at the verbs, which are po<strong>in</strong>ts <strong>of</strong> process<strong>in</strong>g difficulty that are expected to stress work<strong>in</strong>g<br />

memory capacity. The read<strong>in</strong>g times for parenthesized words are not <strong>in</strong>cluded <strong>in</strong> the plotted po<strong>in</strong>ts.)<br />

Figure 2.1: Read<strong>in</strong>g times for English relative clauses by read<strong>in</strong>g span value (low,<br />

middle and high) (K<strong>in</strong>g and Just, 1991).<br />

lated decl<strong>in</strong>e no <strong>in</strong> read<strong>in</strong>g the ability time to imitate difference sentences on is largest the ma<strong>in</strong> <strong>in</strong> casesverbbiguities.<br />

causedA by comprehender extractionencounter<strong>in</strong>g type. Also, an ambiguity there was might<br />

<strong>in</strong> which an the extraction process<strong>in</strong>g <strong>of</strong> the type ma<strong>in</strong> × syntactic verb (emb/matrix) constituent is <strong>in</strong>teraction. select a s<strong>in</strong>gle <strong>in</strong>terpretation In particular, (Frazier, the1978; read<strong>in</strong>g Just & times Carpenter,<br />

<strong>in</strong>terrupted onby the embedded process<strong>in</strong>g <strong>of</strong> a and long the embedded ma<strong>in</strong>constituent. verb yielded1987; theMarcus, pattern 1980), V or she or he might reta<strong>in</strong> two alternative<br />

emb. < V ma<strong>in</strong> <strong>in</strong> the SRC and<br />

This type <strong>of</strong> construction requires that the <strong>in</strong>itial portion <strong>of</strong> the <strong>in</strong>terpretations until some later disambiguat<strong>in</strong>g <strong>in</strong>formation is<br />

ma<strong>in</strong> constituent the opposite be reta<strong>in</strong>ed (V ma<strong>in</strong> <strong>in</strong> work<strong>in</strong>g < V emb) memory <strong>in</strong> the while ORC. the provided (Gorrell, 1987; Kurtzman, 1985). These two schemes<br />

embedded constituent is processed under the memory load, for deal<strong>in</strong>g with syntactic ambiguity have been posed as oppos-<br />

and then the (7) stored English portion SRC must be and made ORC accessible (K<strong>in</strong>g aga<strong>in</strong> and Just, <strong>in</strong>g (and 1991): mutually exclusive) alternatives. However, <strong>in</strong> a series <strong>of</strong><br />

when its f<strong>in</strong>al portion is be<strong>in</strong>g processed. In addition to this experiments, we found that both positions could be reconciled<br />

a. The reporter thati ei attacked the senator admitted the error. (SRC)<br />

age-related difference <strong>in</strong> imitation performance, Kemper by postulat<strong>in</strong>g <strong>in</strong>dividual differences <strong>in</strong> the degree to which<br />

found a correspond<strong>in</strong>g b. age-related The reporter difference thati <strong>in</strong> spontaneous the senatormultiple attacked representations ei admitted are ma<strong>in</strong>ta<strong>in</strong>ed the error. for a (ORC) syntactic ambigu-<br />

production (Kemper, 1988; Kynette & Kemper, 1986).<br />

ity (MacDonald, Just, & Carpenter, <strong>in</strong> press).<br />

Thus, the (8) decl<strong>in</strong>e German <strong>in</strong> language SRC performance and ORC <strong>in</strong> the (Konieczny elderly is and In the Ruh, model 2003): we advance, multiple representations are <strong>in</strong>i-<br />

focused on sentences a. whose Dersyntax<br />

Wärter, makes deri<br />

large demands ei denon<br />

Häftl<strong>in</strong>g tially constructed beleidigte, by all entdeckte comprehenders denon<br />

Tunnel. first encounter<strong>in</strong>g<br />

work<strong>in</strong>g memory. In general, the <strong>in</strong>dividual operations <strong>of</strong> lan- the syntactic ambiguity. Each <strong>of</strong> the multiple representations is<br />

guage process<strong>in</strong>g show little The evidence guard, <strong>of</strong> decl<strong>in</strong>e whonom with age theacc when prisoner assumed <strong>in</strong>sulted, to have an activation discovered level the proportional tunnel. to its fre-<br />

the total process<strong>in</strong>g load ‘The is small. guard However, whoat <strong>in</strong>sulted times <strong>of</strong> high the prisoner quency, its discovered syntactic complexity, the tunnel.’ and its (SRC) pragmatic plausibility.<br />

demand, the total performance does decl<strong>in</strong>e, <strong>in</strong>dicat<strong>in</strong>g an age- The important new postulate <strong>of</strong> our theory is that the work<strong>in</strong>g<br />

related decrease <strong>in</strong> the b. overall Der Wärter, work<strong>in</strong>g memory deni capacity der for Häftl<strong>in</strong>g memory ei capacity beleidigte, <strong>of</strong> the comprehender entdeckte <strong>in</strong>fluences den Tunnel. the duration<br />

language. The guard, whoacc thenom prisoner (i.e, <strong>in</strong>terven<strong>in</strong>g <strong>in</strong>sulted, text) over discovered<br />

which multiple the syntactic tunnel. representations<br />

can be ma<strong>in</strong>ta<strong>in</strong>ed. A low span reader does not have suffi-<br />

‘The guard who the prisoner <strong>in</strong>sulted<br />

Syntactic Ambiguity: S<strong>in</strong>gle Versus Multiple<br />

cient capacity discovered to ma<strong>in</strong>ta<strong>in</strong> the the tunnel.’ two <strong>in</strong>terpretations, (ORC) and soon<br />

abandons the less preferred <strong>in</strong>terpretation, which results <strong>in</strong> a<br />

Representations<br />

Similar results concern<strong>in</strong>g the subject/object s<strong>in</strong>gle-<strong>in</strong>terpretation differencescheme. were obta<strong>in</strong>ed In contrast, a <strong>in</strong>high German, span reader<br />

Another French, facet <strong>of</strong> language H<strong>in</strong>di, Japanese, that could generate Korean demand and for others,<br />

will<br />

<strong>in</strong>volv<strong>in</strong>g<br />

be able to ma<strong>in</strong>ta<strong>in</strong><br />

different<br />

two<br />

sorts<br />

<strong>in</strong>terpretations<br />

<strong>of</strong> paradigms<br />

for some<br />

like<br />

period.<br />

additional resources is syntactic ambiguity, particularly <strong>in</strong> the The full set <strong>of</strong> results is too long to present here, because it<br />

absence <strong>of</strong> eye-track<strong>in</strong>g, a preced<strong>in</strong>g context self-paced that selects read<strong>in</strong>g, among the and possible bra<strong>in</strong> imag<strong>in</strong>g <strong>in</strong>cludes read<strong>in</strong>g technics times (see and table comprehension 2.1 for references). rates on unam-<br />

<strong>in</strong>terpretations. Explanations If comprehenders for the were process<strong>in</strong>g to represent more differences than biguous between sentences the two and two types resolutions <strong>of</strong> RCs <strong>of</strong> ambiguous must ideally sentences<br />

one <strong>in</strong>terpretation <strong>of</strong> an ambiguity dur<strong>in</strong>g the portion <strong>of</strong> a sen- (MacDonakl, Just, & Carpenter, <strong>in</strong> press, for details). However,<br />

tence that is ambiguous, this would clearly demand additional we can present the critical data that support the central claim,<br />

capacity. However, the exist<strong>in</strong>g data and the correspond<strong>in</strong>g the- which makes an un<strong>in</strong>tuitive prediction. In the survey <strong>of</strong> capacories<br />

are <strong>in</strong> disagreement about the process<strong>in</strong>g <strong>of</strong> syntactic amity effects presented above, a greater capacity produces better<br />

19


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

Language Task References<br />

Brazilian Portuguese RSVP Gouvea (2003)<br />

Dutch SPR Frazier (1987)<br />

SPR, eye-track<strong>in</strong>g Mak et al. (2002)<br />

English Lexical Decision Ford (1983)<br />

SPR K<strong>in</strong>g and Just (1991); Gibson<br />

et al. (2005a)<br />

ERP K<strong>in</strong>g and Kutas (1995)<br />

fMRI Caplan et al. (2002); Just<br />

et al. (1996)<br />

PET Stromswold et al. (1996)<br />

French phoneme-monitor<strong>in</strong>g Frauenfelder and Segui<br />

(1980)<br />

click-monitor<strong>in</strong>g Cohen and Mehler (1996)<br />

eye-track<strong>in</strong>g Holmes and O’Regan (1981)<br />

German SPR Schriefers et al. (1995)<br />

ERP Meckl<strong>in</strong>ger et al. (1995)<br />

H<strong>in</strong>di SPR Vasishth and Lewis (2006b)<br />

Japanese SPR Miyamoto and Nakamura<br />

(2003)<br />

Korean SPR Kwon et al. (2004)<br />

Table 2.1: A selection <strong>of</strong> papers report<strong>in</strong>g a subject preference (extended table orig<strong>in</strong>ally<br />

from L<strong>in</strong> and Bever, 2006b).<br />

cover the global preference <strong>in</strong> several languages. Hence, <strong>in</strong>herent differences between<br />

both constructions have to be found that could account for the diverse process<strong>in</strong>g effects.<br />

The most reliable cross-l<strong>in</strong>guistical difference between SRCs and ORCs is word order.<br />

For example there is a greater distance between the head noun and the gap <strong>in</strong> the ORC<br />

<strong>in</strong> English and German (cf. example 7 and 8). The <strong>in</strong>volved dependencies are assumed<br />

to be particularly memory demand<strong>in</strong>g and produce locality effects (e.g. Grodner and<br />

Gibson, 2005) by <strong>in</strong>tegration difficulty. Challeng<strong>in</strong>g for a cross-l<strong>in</strong>guistic explanation is<br />

that <strong>in</strong> some languages, e.g. Korean, Japanese, and Ch<strong>in</strong>ese, RCs are prenom<strong>in</strong>al, i.e,<br />

they precede the head noun. Others like H<strong>in</strong>di use both possibilities Vasishth and Lewis<br />

(2006b). In most cases the position <strong>of</strong> the RC before or after the head noun does not<br />

seem to be an <strong>in</strong>trud<strong>in</strong>g factor as Korean and Japanese align with post-nom<strong>in</strong>al languages<br />

show<strong>in</strong>g a subject preference. Popular locality-<strong>in</strong>dependent word order explanations are<br />

canonicity and frequency: <strong>in</strong> most languages SRCs have a more canonical word order<br />

than ORCs and furthermore a higher corpus frequency. Apart from syntactic properties<br />

20


2.2 Ch<strong>in</strong>ese Relative Clauses<br />

also semantic <strong>in</strong>formation plays an important role <strong>in</strong> the subject/object difference. For<br />

example, experiments by Traxler et al. (2002) showed that animacy and verb-<strong>in</strong>duced<br />

plausibility are crucial predictors for difficulty differences between both constructions.<br />

Although the global subject preference proves to be resistant, there is at least one<br />

exception reported so far: Ch<strong>in</strong>ese Mandar<strong>in</strong>, where RCs precede the head noun, just<br />

like <strong>in</strong> Japanese and Korean. Hsiao and Gibson (2003) found <strong>in</strong> an SPR experiment that<br />

<strong>in</strong> Mandar<strong>in</strong> subject relatives are <strong>in</strong> fact harder to comprehend than object relatives.<br />

Interest<strong>in</strong>gly, henceforward the literature about Ch<strong>in</strong>ese relative clauses reports mixed<br />

results. While L<strong>in</strong> and Garnsey (2007) and Qiao and Forster (2008) confirmed Hsiao and<br />

Gibson’s results, Kuo and Vasishth (2007) and L<strong>in</strong> and Bever (2006b) found a subject<br />

preference. The apparently unsolved question about Ch<strong>in</strong>ese Mandar<strong>in</strong> might tip the<br />

scales <strong>in</strong> the search <strong>of</strong> a globally consistent theory <strong>of</strong> relative clause comprehension.<br />

Theories like Gibson’s (Gibson, 1998) Dependency Locality Theory, which favors an ORadvantage<br />

for Mandar<strong>in</strong>, or the Accessibility Hypothesis (Keenan and Comrie, 1977; L<strong>in</strong><br />

et al., 2005), which favors a global subject preference, might rise and fall as candidates<br />

for a theory globally consistent across languages. For other theories that are <strong>based</strong> on<br />

canonicity or word order frequency to make reasonable predictions further <strong>in</strong>vestigations<br />

<strong>of</strong> the Mandar<strong>in</strong> relative clause structure are necessary.<br />

The follow<strong>in</strong>g section will discuss the structure <strong>of</strong> Mandar<strong>in</strong> RCs. Then relevant<br />

theories are assessed on their predictions concern<strong>in</strong>g English and Mandar<strong>in</strong>. In the<br />

follow<strong>in</strong>g, recent studies about the Ch<strong>in</strong>ese SRC/ORC difference will be discussed and<br />

their results will be compared to the predictions <strong>of</strong> the outl<strong>in</strong>ed theories. F<strong>in</strong>ally I will<br />

turn to the second topic: language-specific forgett<strong>in</strong>g effects <strong>in</strong> center-embedd<strong>in</strong>g.<br />

2.2 Ch<strong>in</strong>ese Relative Clauses<br />

Relative clauses <strong>in</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> are head f<strong>in</strong>al, i.e., they precede the modified<br />

noun. The RC is attached to the noun with the <strong>in</strong>terven<strong>in</strong>g genitive marker de (gen),<br />

which here serves as a relativizer.<br />

(9) a. Mandar<strong>in</strong> SRC:<br />

[ei yaoq<strong>in</strong>g fuhao dei] guanyuani x<strong>in</strong>huaibugui.<br />

<strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

V O S<br />

’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon has bad <strong>in</strong>tentions.’<br />

b. Mandar<strong>in</strong> ORC:<br />

[fuhao yaoq<strong>in</strong>g ei dei] guanyuani x<strong>in</strong>huaibugui.<br />

tycoon <strong>in</strong>vite gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

S V O<br />

’The <strong>of</strong>ficial who the tycoon <strong>in</strong>vited has bad <strong>in</strong>tentions.’<br />

Subject extracted RCs (example 9a) start with the embedded verb, before which a<br />

subject gap is assumed that has to be filled with the head noun. The SRC’s surface<br />

21


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

structure is ‘V NO de NS’, where NO is the embedded object and NS the head noun<br />

serv<strong>in</strong>g as the RC subject. Object relatives (example 9b) start with the embedded<br />

subject and the object gap is assumed just before the relativizer. The general structure<br />

is NS V de NO, where the head noun (NO) serves as the RC object.<br />

The pre-nom<strong>in</strong>al nature <strong>of</strong> Ch<strong>in</strong>ese RCs has three major structural consequences that<br />

dist<strong>in</strong>guish these constructions from RCs <strong>in</strong> English and other languages and hence<br />

could lead to different theory predictions. The first difference is the position <strong>of</strong> the<br />

gap. In English the filler-gap distance is shorter <strong>in</strong> the subject relative while the headf<strong>in</strong>al<br />

nature <strong>of</strong> Ch<strong>in</strong>ese yields a shorter distance <strong>in</strong> object relatives. This and the fact<br />

that the gap precedes the filler should make a difference for memory-<strong>based</strong> accounts<br />

and gap-search<strong>in</strong>g algorithms. Secondly, the head-f<strong>in</strong>al structure produces a temporal<br />

ambiguity, especially <strong>in</strong> the Ch<strong>in</strong>ese ORC. In English the start <strong>of</strong> a non-reduced RC is<br />

marked by a relative pronoun (e.g. that). In Ch<strong>in</strong>ese, because the relativizer follows<br />

the RC, the reader is not necessarily aware <strong>of</strong> the RC while read<strong>in</strong>g it. Initially, the<br />

Ch<strong>in</strong>ese ORC has the form <strong>of</strong> a simple sentence. This should have consequences for<br />

pars<strong>in</strong>g and prediction. F<strong>in</strong>ally, the canonicity properties <strong>of</strong> object and subject RCs is<br />

swapped <strong>in</strong> Ch<strong>in</strong>ese. In contrast to English and other languages, where the SRC exhibits<br />

the canonical word order, it is the ORC <strong>in</strong> Ch<strong>in</strong>ese which resembles the SVO word<br />

order <strong>of</strong> simple sentences. Another consequence <strong>of</strong> the noun-preced<strong>in</strong>g RC concerns the<br />

complexity <strong>of</strong> deeper embedd<strong>in</strong>g. Interest<strong>in</strong>gly, <strong>in</strong> Ch<strong>in</strong>ese an SRC embedd<strong>in</strong>g produces<br />

the assumedly more complex center-embedd<strong>in</strong>g structure while ORC-embedd<strong>in</strong>g results<br />

<strong>in</strong> an iterative l<strong>in</strong>ear structure.<br />

(10) a. Doubly embedded SRC (Hsiao and Gibson, 2003):<br />

[ei yaoq<strong>in</strong>g [ej gojie faguan dej ] fuhaoj dei] guanyuani<br />

gap <strong>in</strong>vite gap conspire judge gen tycoon gen <strong>of</strong>ficial . . .<br />

V1 V2 N1 de1 N2 de2 N3<br />

’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon who conspired with the judge. . . ’<br />

b. Doubly embedded ORC (Hsiao and Gibson, 2003):<br />

[[fuhao yaoq<strong>in</strong>g ei dei] faguani gojie ej dej ] guanyuanj<br />

tycoon <strong>in</strong>vite gap gen judge conspire gap gen <strong>of</strong>ficial . . .<br />

N1 V1 de1 N2 V2 de2 N3<br />

’The <strong>of</strong>ficial who the judge who the tycoon <strong>in</strong>vited conspired with. . . ’<br />

As can be seen <strong>in</strong> the example (10), the doubly embedded SRC shows a recursive<br />

center-embedd<strong>in</strong>g dependency between the head noun and the related gap. In the doubly<br />

embedded ORC the dependency is l<strong>in</strong>ear. In the head-<strong>in</strong>itial language English embedd<strong>in</strong>g<br />

results <strong>in</strong> the opposite complexity properties.<br />

Psychological Reality and Locality The semantic <strong>in</strong>terpretation <strong>of</strong> the Mandar<strong>in</strong><br />

RC structure is the same as <strong>in</strong> other languages. However, their dramatic syntactic<br />

difference from post-nom<strong>in</strong>al RCs raises the question whether head-f<strong>in</strong>al constructions<br />

22


2.2 Ch<strong>in</strong>ese Relative Clauses<br />

are syntactically comparable to head-<strong>in</strong>itial RCs. A cross-l<strong>in</strong>guistic process<strong>in</strong>g theory<br />

for RCs should capture all k<strong>in</strong>ds <strong>of</strong> RCs. This requires the captured structures to<br />

<strong>in</strong>duce similar pars<strong>in</strong>g processes. If the Ch<strong>in</strong>ese language did <strong>in</strong> fact not use comparable<br />

syntactic realizations, there would be no need for exist<strong>in</strong>g syntactic RC theories to fit the<br />

Ch<strong>in</strong>ese data. Indeed some researchers treat head-f<strong>in</strong>al RCs as adjective-like adjuncts not<br />

conta<strong>in</strong><strong>in</strong>g any gap (e.g. for Japanese: Matsumoto, 1997). This would make a filler-gap<br />

resolution process unnecessary <strong>in</strong> pars<strong>in</strong>g Ch<strong>in</strong>ese RCs. However, L<strong>in</strong> et al. (2005) (also<br />

reported <strong>in</strong> L<strong>in</strong>, 2007) provided empirical evidence that Mandar<strong>in</strong> RC constructions are<br />

<strong>in</strong>deed gap-conta<strong>in</strong><strong>in</strong>g structures that are processed differently from adjunctive phrases.<br />

Their experiment conta<strong>in</strong>ed Ch<strong>in</strong>ese possessor relative clauses (PRCs), which are similar<br />

to adjunctive phrases on surface structure. In contrast to RCs, PRCs do not conta<strong>in</strong> an<br />

overt gap. In a canonical PRC like <strong>in</strong> example (11) the region before the relativizer has<br />

the structure ‘N1 V _ N2’ with the covert possessor gap be<strong>in</strong>g between the verb and<br />

the possessee.<br />

(11) huairen bangjia _<br />

bad guys kidnap<br />

laopo<br />

wife<br />

de zongcai jued<strong>in</strong>g baoj<strong>in</strong>g<br />

DE chairman decide call police<br />

‘The chairman whose wife some bad guys kidnapped decided to call the police.’<br />

The construction can be slightly changed to alter the gap position. By us<strong>in</strong>g the<br />

marker ba, gap and object can appear pre-verbally. Insert<strong>in</strong>g the passive marker bei<br />

allows the possessor gap to be <strong>in</strong> the sentence <strong>in</strong>itial subject position. In a self-paced<br />

read<strong>in</strong>g experiment L<strong>in</strong> and colleagues controlled the material for three different positions<br />

<strong>of</strong> the potential gap and compared the read<strong>in</strong>g times to adjunctive clauses. The results<br />

show that the read<strong>in</strong>g speed on the head noun was dependent on the gap position only<br />

for the possessive RCs but not for the adjunctive clauses. The process<strong>in</strong>g differences are<br />

<strong>in</strong>terpreted <strong>in</strong> L<strong>in</strong> (2007) as evidence for filler-gap dependencies <strong>in</strong> Ch<strong>in</strong>ese pre-nom<strong>in</strong>al<br />

relative clauses. This makes them “psychologically real” (L<strong>in</strong>, 2007; p. 9) and, hence,<br />

comparable to post-nom<strong>in</strong>al RCs. Another crucial f<strong>in</strong>d<strong>in</strong>g was that read<strong>in</strong>g time on the<br />

head noun <strong>of</strong> the PRCs was fastest <strong>in</strong> the bei condition. Notably, this is the condition<br />

hav<strong>in</strong>g the possessor gap <strong>in</strong> subject position, mak<strong>in</strong>g the filler-gap distance longer than<br />

<strong>in</strong> the other two conditions. This clearly contradicts a locality account.<br />

Elided Subject or Gap Assumption For some <strong>of</strong> the exam<strong>in</strong>ed theories it is important<br />

to know whether the reader is aware <strong>of</strong> the <strong>in</strong>itial subject gap <strong>in</strong> the Ch<strong>in</strong>ese SRC<br />

with the form ‘gap V N1 de N2’. The knowledge <strong>of</strong> the gap can affect <strong>in</strong>tegration and<br />

memory processes as well as structural predictions and the gap-search<strong>in</strong>g mechanism.<br />

For example, if the SRC was the only construction <strong>in</strong> Mandar<strong>in</strong> that starts with ‘V N’,<br />

the reader would know immediately at the first word that he is read<strong>in</strong>g an SRC. Address<strong>in</strong>g<br />

that question, Kuo and Vasishth (2007) performed a corpus study on the S<strong>in</strong>ica<br />

Corpus 3.0 (5 million words), which is summarized <strong>in</strong> table 2.2. Kuo and Vasishth found<br />

639 SRC-like structures (V N1 de N2), <strong>of</strong> which only 19% (119) were <strong>in</strong> fact subject<br />

23


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

relatives. The majority <strong>of</strong> the structures were possessive modifiers with an <strong>in</strong>animate<br />

head noun (see example 12).<br />

(12) a. tisheng qiye de j<strong>in</strong>gzhengli<br />

<strong>in</strong>crease company gen competitiveness<br />

‘To <strong>in</strong>crease the companys competitiveness.’<br />

b. guyong<br />

hire<br />

yuangong<br />

employee<br />

de chengben. . .<br />

gen cost<br />

‘The cost <strong>of</strong> hir<strong>in</strong>g an employee. . . ’<br />

SRC-like ORC-like<br />

V N1 de N2 Predicate N1 V de N2 Predicate<br />

N2 animate N2 <strong>in</strong>animate N2 animate N2 <strong>in</strong>animate<br />

N1 animate 13 51 3 42<br />

N1 <strong>in</strong>animate 106 469 1 71<br />

Table 2.2: Table from Kuo and Vasishth (2007), summariz<strong>in</strong>g their corpus study on<br />

RC-like structures. Bona fide RCs are <strong>in</strong>dicated by numbers <strong>in</strong> bold.<br />

Consider<strong>in</strong>g these non-gapped structures and the existence <strong>of</strong> further non-gapped<br />

constructions <strong>in</strong> Mandar<strong>in</strong> that start with ‘V N’ the reader’s awareness <strong>of</strong> the gap is<br />

questionable. In Ch<strong>in</strong>ese <strong>in</strong> an appropriate context even mono-clausal structures with<br />

an elided subject (_ V N) are possible. However, s<strong>in</strong>ce it is not clear how frequent these<br />

structures are, and s<strong>in</strong>ce they mostly need special contexts, it is not clear how pro-drop<br />

mono-clauses affect the parser’s predictions. Because the question is not clearly solved<br />

Kuo and Vasishth consider both the Gap Assumption and the Elided Subject Assumption<br />

as concurrent theories. Under the Gap Assumption the reader knows immediately that<br />

he or she is read<strong>in</strong>g a subject relative. Under the Elided Subject Assumption, which<br />

receives stronger support from the corpus study, the predictions are rather unclear but<br />

should not <strong>in</strong>volve a gapped structure.<br />

Garden Path effects L<strong>in</strong> (2007) reports evidence for garden path effects due to<br />

temporal ambiguities <strong>in</strong> both head-f<strong>in</strong>al object and subject RCs. Several read<strong>in</strong>g time<br />

studies <strong>of</strong> Japanese and Ch<strong>in</strong>ese Mandar<strong>in</strong> show a facilitat<strong>in</strong>g effect on the relativizer<br />

and the head noun when the RC region is disambiguated earlier. The disambiguation<br />

was achieved by explicit mark<strong>in</strong>g, classifier mismatch (e.g. Hsu et al., 2006; Yoshida<br />

et al., 2004), RC-<strong>in</strong>duc<strong>in</strong>g contexts (e.g. Ishizuka et al., 2006), or explicit participant<br />

<strong>in</strong>formation (L<strong>in</strong> and Bever, 2007). The facilitat<strong>in</strong>g effect suggests that without disambiguation<br />

a reanalysis happens at the region <strong>of</strong> the relativizer and the head noun because<br />

the parser expects a ma<strong>in</strong> clause. See<strong>in</strong>g this effect not only <strong>in</strong> the ORC but also <strong>in</strong> the<br />

24


2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />

SRC provides further evidence for the elided subject assumption favored by Kuo and<br />

Vasishth (2007).<br />

How are the special characteristics <strong>of</strong> the Mandar<strong>in</strong> RC <strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g differences<br />

between subject and object extraction? Do they lead to a subject preference<br />

prediction or do they account for a deviance from a cross-l<strong>in</strong>guistically consistent theory?<br />

What further properties <strong>of</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> have to be taken <strong>in</strong>to account to ga<strong>in</strong><br />

useful predictions? I will now briefly resume the theories concern<strong>in</strong>g the subject/object<br />

difference that were presented <strong>in</strong> chapter 1 and make their predictions for Mandar<strong>in</strong> and<br />

English relative clauses explicit. Then, empirical studies concern<strong>in</strong>g the Mandar<strong>in</strong> RCs<br />

will be exam<strong>in</strong>ed and related to the theoretical predictions.<br />

2.3 Predict<strong>in</strong>g RC Extraction Preferences<br />

Cross-l<strong>in</strong>guistically<br />

2.3.1 Memory<br />

DLT<br />

In the SRC <strong>in</strong> examples (1a) and (8a) the distance between the filler (the head noun)<br />

and its gap is short. The ORC (examples 1b and 8b) on the other hand, conta<strong>in</strong>s a<br />

distant dependency. The cost metrics as illustrated for English <strong>in</strong> figure 2.2 show that<br />

the Dependency Locality Theory accounts for higher difficulties <strong>in</strong> the ORC.<br />

SRC The reporter whoi ei attacked the senator admitted the error<br />

C(I) 0 0 0 0+1 0 0+1 3 0 0+1<br />

C(S) 2 1 3 2 2 1 1 1 0<br />

Total 2 1 3 3 2 2 4 1 1<br />

ORC The reporter whoi the senator attacked ei admitted the error<br />

C(I) 0 0 0 0 0 1+2 3 0 0+1<br />

C(S) 2 1 3 4 3 1 1 1 0<br />

Total 2 1 3 4 3 3 4 1 1<br />

Figure 2.2: DLT cost metrics for English object and subject relative clauses<br />

The example shows that Integration Cost on the embedded verb attacked is higher<br />

for the ORC because two <strong>in</strong>tegrations take place at that position. First, attacked is<br />

<strong>in</strong>tegrated with its subject senator, consum<strong>in</strong>g 1 EU. Establish<strong>in</strong>g the relation between<br />

the relative pronoun who and the empty element consumes 2 EU because two discourse<br />

25


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

referents (senator and attacked) have been processed meanwhile. In the SRC, <strong>in</strong>tegrat<strong>in</strong>g<br />

the empty element is cost-free, while the subject attachment <strong>of</strong> attacked uses 1 EU<br />

because <strong>of</strong> attacked count<strong>in</strong>g as one discourse referent. This results <strong>in</strong> a 3:1 advantage<br />

for the ORC. This is partly due to the fact that <strong>in</strong> the ORC all arguments are <strong>in</strong>tegrated<br />

at once. In the SRC the object <strong>in</strong>tegration <strong>of</strong> the RC happens later at senator, consum<strong>in</strong>g<br />

1 EU. Still, the difference with respect to the whole RC region is 3:2 and <strong>in</strong>creases when<br />

Storage Cost is counted <strong>in</strong>. Storage Cost differs from position four on, where the word<br />

the <strong>in</strong> the ORC predicts four heads, while attacked <strong>in</strong> the SRC predicts only two. The<br />

reporter who the predicts the upcom<strong>in</strong>g <strong>of</strong> an embedded subject (s<strong>in</strong>ce reporter is the<br />

object), a transitive verb (s<strong>in</strong>ce the sentence has an object), an object gap, and the ma<strong>in</strong><br />

verb. The reporter who attacked only predicts the ma<strong>in</strong> verb and a direct object for the<br />

embedded clause. Summ<strong>in</strong>g the cost metrics together we ga<strong>in</strong> a total cost <strong>of</strong> 12 units for<br />

the ORC compared to 9 units for the SRC throughout the RC. Relat<strong>in</strong>g process<strong>in</strong>g cost<br />

to process<strong>in</strong>g time DLT clearly predicts a process<strong>in</strong>g advantage for SRCs on the whole<br />

embedded clause. Consistent with the K<strong>in</strong>g and Just (1991) study and the correction<br />

by Grodner and Gibson (2005) the DLT predicts no clause type difference on the ma<strong>in</strong><br />

verb. However, both studies show longest read<strong>in</strong>g times at the embedded verb <strong>in</strong> the<br />

ORC, whereas the DLT predicts a higher difficulty at the ma<strong>in</strong> verb.<br />

All <strong>in</strong> all there a subject preference is predicted for English. Now, what has the DLT<br />

to say about Ch<strong>in</strong>ese? Figure 2.3 shows the costs for Mandar<strong>in</strong> subject and object<br />

relative clauses as assumed by Hsiao and Gibson (2003).<br />

SRC ei yaoq<strong>in</strong>g fuhao dei guanyuan x<strong>in</strong>huaibugui.<br />

gap <strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

C(I) 0 1 1 3 1<br />

C(S) 3 2 2 1 0<br />

Total 3 3 3 4 1<br />

ORC fuhao yaoq<strong>in</strong>g ei dei guanyuan x<strong>in</strong>huaibugui.<br />

tycoon <strong>in</strong>vite gap gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

C(I) 0 1 0 1 1<br />

C(S) 1 1 2 1 0<br />

Total 1 2 2 2 1<br />

Figure 2.3: DLT Integration and Storage Cost for Mandar<strong>in</strong> relative clauses.<br />

Integration Cost ma<strong>in</strong>ly predicts a difference on the head noun guanyuan “<strong>of</strong>ficial”.<br />

The reason is the greater distance between the head-noun and the embedded gap position<br />

<strong>in</strong> the SRC. Also on the relativizer there is an ORC advantage because at this po<strong>in</strong>t<br />

<strong>in</strong> the SRC the referred subject is <strong>in</strong>tegrated with the embedded verb. Storage Cost<br />

predicts higher difficulty for the SRC from the first word on. Hsiao and Gibson assume<br />

26


2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />

that at that position the reader already knows that an RC is follow<strong>in</strong>g because the<br />

sentential subject is miss<strong>in</strong>g. In other words, the gap is overtly recognized and affects<br />

the prediction. For the same reason the second word predicts more heads <strong>in</strong> the SRC<br />

than <strong>in</strong> the ORC, namely the miss<strong>in</strong>g subject and a ma<strong>in</strong> verb. In the ORC, on the<br />

other hand, only a direct object is predicted due to the temporal ambiguity <strong>of</strong> the <strong>in</strong>itial<br />

RC resembl<strong>in</strong>g the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a ma<strong>in</strong> clause. The result<strong>in</strong>g prediction <strong>of</strong> the DLT is<br />

a higher process<strong>in</strong>g cost on the first two words <strong>of</strong> the SRC compared to the ORC. This<br />

means an object preference for Mandar<strong>in</strong> head-f<strong>in</strong>al RCs. However, the SRC Storage<br />

Cost predictions on the first two words are questionable. They rest on the assumption<br />

that the RC structure is the most probable one, given a miss<strong>in</strong>g subject. This would be<br />

the case under the Gap Assumption. However, assum<strong>in</strong>g that DLT’s prediction choices<br />

correlate with corpus frequencies it is very unlikely that the parser should predict a<br />

relative clause before see<strong>in</strong>g the relativizer. Consequently, under the Elided Subject<br />

assumption the DLT would predict less heads on the first two words <strong>of</strong> the RC. Under<br />

this assumption the object preference should disappear <strong>in</strong> that region.<br />

Computational Models<br />

The CC-READER rests on similar assumptions as DLT, namely capacity limitation and<br />

<strong>in</strong>tegration-<strong>based</strong> decay. A simulation <strong>of</strong> K<strong>in</strong>g and Just’s study produced comparable<br />

results (see figure 2.4 from Just and Carpenter, 1992). However, the span × RC type<br />

<strong>in</strong>teraction on the ma<strong>in</strong> verb and the last word <strong>in</strong> the RC region was not predicted<br />

(MacDonald and Christiansen, 2002). Specifically, there is a greater read<strong>in</strong>g span effect<br />

<strong>in</strong> the ORC than <strong>in</strong> the SRC <strong>in</strong> the data, whereas the simulation results show no such<br />

difference. Additionally, as Lewis and Vasishth (2005) po<strong>in</strong>t out, CC-READER underestimates<br />

the difficulty on the ORC embedded verb compared to the ma<strong>in</strong> verb. All <strong>in</strong> all<br />

a subject preference is predicted for English. For Mandar<strong>in</strong> <strong>in</strong> accordance with the DLT<br />

an object preference can be expected. The same would apply for the ACT-R sentence<br />

process<strong>in</strong>g model. This one makes similar predictions as CC-READER with respect to<br />

English RCs, albeit fitt<strong>in</strong>g the results slightly better.<br />

2.3.2 Expectation<br />

In expectation-<strong>based</strong> theories like Levy (2008) the subject/object difference is accounted<br />

for similarly to experience-<strong>based</strong> theories. Superficially, the highly frequent SRC structure<br />

receives a higher rank<strong>in</strong>g <strong>in</strong> the probability distribution <strong>of</strong> cont<strong>in</strong>uations than the<br />

ORC. Consequently an expectation-<strong>based</strong> approach would predict a subject preference<br />

<strong>in</strong> all languages where the SRC corpus frequency exceeds the ORC frequency, as is the<br />

case <strong>in</strong> English, German, and Mandar<strong>in</strong>. However, similarly to experience theories, expectation<br />

is a framework without def<strong>in</strong>ite commitments. For ga<strong>in</strong><strong>in</strong>g detailed predictions<br />

it is necessary to know the exact word-by-word likelihoods with respect to the grammar.<br />

Here aga<strong>in</strong> the Gap versus Elided subject assumptions play theoretically a role. How-<br />

27


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

140 MARCEL ADAM JUST AND PATRICIA A. CARPENTER<br />

fc<br />

o<br />

H<br />

03<br />

Subject Relative<br />

I I I I<br />

[The] reporter senator admitted the<br />

that error,<br />

attacked<br />

the<br />

SIMULATION<br />

Object Relative<br />

a- B<br />

I I I<br />

[The] reporter attacked admitted the<br />

that error,<br />

the<br />

senator<br />

Figure 2.4: CC-READER simulation HUMAN results DATA on English subject and object extracted<br />

relative clauses (figure from Just and Carpenter, 1992; p. 140).<br />

Subject Relative<br />

Object Relative<br />

900<br />

Lowa<br />

ever, <strong>in</strong> Mandar<strong>in</strong> both assumptions would most likely result <strong>in</strong> the same predictions.<br />

Assum<strong>in</strong>g800 possible elided subjects <strong>in</strong> a ma<strong>in</strong> clause, both a ma<strong>in</strong> P'' clause and an embedded<br />

Low<br />

RC could be expected <strong>in</strong> either construction. Thus the mere frequency High <strong>of</strong> subject versus<br />

OH 700<br />

object Ed extractions could <strong>in</strong>deed be decisive <strong>in</strong> this case, predict<strong>in</strong>g a subject preference.<br />

On the other 600 hand, an overt subject gap <strong>in</strong> the SRC as assumed <strong>in</strong> the gap assumption<br />

would olower<br />

the cost at the relativizer for the SRC. Due to the ma<strong>in</strong> clause ambiguity<br />

2<br />

<strong>in</strong> the R ORC 500 a costly Highupdate<br />

<strong>of</strong> the plausibility rank<strong>in</strong>g would happen at the relativizer<br />

where, ORC becomes more 1 j_ likely i than I a ma<strong>in</strong> clause. This cost is lower I I I <strong>in</strong> the subject<br />

extraction[The] duereporter to thesenator higheradmitted rank<strong>in</strong>gthe <strong>of</strong> an embedded [The] reporter SRC. attacked Consequently, admitted theunder<br />

the Gap<br />

Assumption W athat syntactic expectationerror, theory wouldthat also predict a subject error. preference.<br />

S<br />

attacked<br />

the<br />

2.3.3 Canonicity<br />

the<br />

senator<br />

Figure 9. The number <strong>of</strong> cycles expended on various parts <strong>of</strong> the subject-relative sentences (on the left)<br />

and object-relative sentences (on the right) when the simulation, CC READER, is operat<strong>in</strong>g with more or<br />

less work<strong>in</strong>g memory capacity. (The bottom graph presents the human data for comparison with the<br />

simulation.)<br />

Consider<strong>in</strong>g Greenberg’s classification as a basis, what would a canonical word order account<br />

predict for English and German relative clauses? English subject relative clauses<br />

exhibit the canonical SVO structure whereas object relatives use an OSV order<strong>in</strong>g.<br />

Therefore, a heuristic or base-generative canonicity theory would assign a higher pro-<br />

Pragmatic Influence on Syntactic Process<strong>in</strong>g<br />

cess<strong>in</strong>g cost to ORCs <strong>in</strong> English, which is consistent with empirical evidence. The same<br />

applies for German, agree<strong>in</strong>g on the widely accepted SOV classification. As illustrated <strong>in</strong><br />

example (13) German SRCs have SOV order<strong>in</strong>g and would be preferred, whereas ORCs<br />

have an OSV order<strong>in</strong>g. If we considered an SVO basis, no clear predictions would be<br />

low span simulation, which has a smaller activation maximum.<br />

The words that follow the verbs evoke fewer productions, so<br />

even though the activation maximum applies dur<strong>in</strong>g their fir<strong>in</strong>g,<br />

they complete their execution <strong>in</strong> a smaller number <strong>of</strong> cycles<br />

(compared with the verb process<strong>in</strong>g), and the high-low difference<br />

becomes smaller.<br />

In summary possible. a simulation that varies the amount <strong>of</strong> activation<br />

available for simultaneously comput<strong>in</strong>g and mn<strong>in</strong>tatMng<br />

<strong>in</strong>formation accounts for the read<strong>in</strong>g time differences between<br />

high and low span subjects deal<strong>in</strong>g with syntactic complexity<br />

provided by center-embedded clauses.<br />

The simulation demonstrates how the contribution <strong>of</strong> a pragmatic<br />

cue to syntactic aaatyas depends on an adequate supply<br />

<strong>of</strong> activation. Fust consider the simaJation <strong>of</strong> the high span<br />

subjects (<strong>in</strong> which the activation maximum is relatively high) <strong>in</strong><br />

process<strong>in</strong>g the sentences conta<strong>in</strong><strong>in</strong>g reduced relative clauses.<br />

The <strong>in</strong>animacy <strong>in</strong>formation encoded with the subject noun<br />

"evidence") is still <strong>in</strong> an activated state whea the verb is<br />

be<strong>in</strong>g processed, and this <strong>in</strong>formation is used to select between<br />

the two <strong>in</strong>terpretations <strong>of</strong> the verb (past tense vs. past partici-<br />

28


2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />

(13) German SRC and ORC (Konieczny and Ruh, 2003):<br />

a. Der Wärter, deri ei den Häftl<strong>in</strong>g beleidigte, entdeckte den Tunnel. (SRC)<br />

S<br />

O V<br />

b. Der Wärter,<br />

O<br />

deni der Häftl<strong>in</strong>g ei<br />

S<br />

beleidigte,<br />

V<br />

entdeckte den Tunnel. (ORC)<br />

Where would the difficulties show up <strong>in</strong> read<strong>in</strong>g studies? This is not easy to answer<br />

for such a general theory. The location <strong>of</strong> the effects depends ma<strong>in</strong>ly on the underly<strong>in</strong>g<br />

pars<strong>in</strong>g theory and how it deals with unexpected structures. Generally, one can<br />

assume that slowdowns appear as soon as the reader realizes that he or she is read<strong>in</strong>g<br />

a non-canonical structure. In Mandar<strong>in</strong> this would be the <strong>in</strong>itial verb. Consider<strong>in</strong>g a<br />

transformational account with canonical base-generation on deep structure, some <strong>of</strong> the<br />

difficulties would most likely appear after hav<strong>in</strong>g read the whole embedded structure,<br />

because then it has to be reordered and <strong>in</strong>tegrated. Like English, Ch<strong>in</strong>ese Mandar<strong>in</strong><br />

is also claimed to be SVO (Hsiao and Gibson, 2003; Kuo and Vasishth, 2007). S<strong>in</strong>ce<br />

the ORCs resemble the canonical SVO order but SRCs have VOS, an object preference<br />

would be predicted for Mandar<strong>in</strong>. That means that a canonicity account <strong>based</strong> on<br />

the conventional classifications <strong>of</strong> SVO order<strong>in</strong>g would not speak for a cross-l<strong>in</strong>guistic<br />

subject preference but rather confirm Hsiao and Gibson’s claim <strong>of</strong> Mandar<strong>in</strong> be<strong>in</strong>g an<br />

exception. As stated earlier, the location <strong>of</strong> effects is not clearly to determ<strong>in</strong>e.<br />

2.3.4 <strong>Experience</strong><br />

Consider<strong>in</strong>g only RC type frequencies, there is a clear subject bias <strong>in</strong> most languages.<br />

But a mere comparison <strong>of</strong> SRC and ORC corpus frequencies is a rather abstract method<br />

and not psychologically motivated. A more comprehensive theory <strong>of</strong> experience would<br />

be driven by complex factors. Without a granularity commitment clear predictions are<br />

impossible. As discussed <strong>in</strong> 1.3, the implementation <strong>in</strong> a connectionist network could<br />

shed light on the complex structural relations. MacDonald and Christiansen (2002)<br />

used a simple recurrent prediction network to predict <strong>in</strong>dividual and global differences<br />

<strong>in</strong> English subject and object relative clauses. The study will be discussed <strong>in</strong> detail<br />

<strong>in</strong> chaper 3 and shall only be briefly mentioned here. The network was tra<strong>in</strong>ed on<br />

a simplified grammar <strong>of</strong> English to make word-by-word cont<strong>in</strong>uation predictions. It<br />

performed better on SRCs. Furthermore the <strong>in</strong>teractions found <strong>in</strong> the results are comparable<br />

to the read<strong>in</strong>g span × RC type × region <strong>in</strong>teractions <strong>in</strong> K<strong>in</strong>g and Just<br />

(1991) (see figure 3.3 <strong>in</strong> chapter 3 for details). MacDonald and Christiansen call this a<br />

frequency × regularity <strong>in</strong>teraction, because the word-by-word predictions for the SRC<br />

benefited from its regularity, specifically its similarity <strong>in</strong> word order with ma<strong>in</strong> clauses.<br />

Thus, <strong>in</strong> this case the experience account can be seen as equivalent to the canonicity<br />

account, which makes the same predictions for the same reason. Here the connectionist<br />

implementation justifies the canonicity approach on the basis <strong>of</strong> experience. But this<br />

is no <strong>in</strong>herent connection. The experience account, specifically when implemented as a<br />

29


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

connectionist network, may well make divergent predictions with respect to canonicity<br />

assumptions. Regard<strong>in</strong>g Mandar<strong>in</strong> RCs, Hsiao and Gibson (2003) are confident that<br />

regularity predicts an object preference. They also say that “[...] it rema<strong>in</strong>s an open<br />

question how to formalize this theory so that it makes more detailed predictions.” (p.<br />

14). In the follow<strong>in</strong>g Hsiao and Gibson suggest to implement a theory <strong>of</strong> that k<strong>in</strong>d <strong>in</strong> a<br />

connectionist system like MacDonald and Christiansen (2002) to decide that question.<br />

The discussion about SRC-like structures and elided subjects will be relevant for such<br />

a model. The model<strong>in</strong>g <strong>of</strong> Mandar<strong>in</strong> RC process<strong>in</strong>g is focus <strong>of</strong> this thesis and will be<br />

addressed <strong>in</strong> chapter 4.<br />

2.3.5 Other Explanations<br />

Active Filler Strategy<br />

The Active Filler Strategy (Frazier and Flores d’Arcais, 1989; Frazier and Clifton,<br />

1989) accounts for difficulties and garden-path<strong>in</strong>g effects <strong>in</strong> the region between a filler<br />

and its gap. As soon as a filler is identified the parser starts an active search for the<br />

appropriate gap. Interven<strong>in</strong>g potential gap positions produce resource-consum<strong>in</strong>g ambiguities.<br />

In a relative clause, follow<strong>in</strong>g the strategy, the parser would try to <strong>in</strong>sert the<br />

head-noun as filler immediately after the relativizer that because this is the first possible<br />

gap position. That is a successful strategy <strong>in</strong> the SRC but results <strong>in</strong> the need for<br />

reanalysis <strong>in</strong> ORCs. Therefore, the Active Filler Strategy predicts a higher process<strong>in</strong>g<br />

cost on noun and verb <strong>in</strong> ORCs due to reanalysis.<br />

It is not clear what the Active Filler Strategy would predict for head-f<strong>in</strong>al relative<br />

clauses. S<strong>in</strong>ce the filler succeeds the potential gap positions, the gap-search<strong>in</strong>g has to<br />

happen <strong>in</strong> a post-process<strong>in</strong>g stage or maybe by re-read<strong>in</strong>g the embedded clause. Another<br />

question is then whether the search happens from the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the RC or backwards,<br />

start<strong>in</strong>g from the head noun. A reasonable assumption could be that the Active Filler<br />

Strategy would not apply at all when the whole phrase is already seen. Other structural<br />

strategies like the one proposed by L<strong>in</strong> and Bever (2006b) probably fit better <strong>in</strong>to a<br />

head-f<strong>in</strong>al situation. Then the question <strong>of</strong> subject/object preference is handed over to<br />

accessibility concerns.<br />

Accessibility<br />

A theory <strong>of</strong> accessibility comes <strong>in</strong> several versions. What is shared between them and<br />

important for predictions concern<strong>in</strong>g the subject/object difference is that subjects are<br />

easier accessible than objects. For example, Keenan and Comrie (1977) <strong>in</strong>troduce an<br />

accessibility-order<strong>in</strong>g for grammatical functions like the follow<strong>in</strong>g:<br />

subject > direct object > <strong>in</strong>direct object > . . .<br />

30


2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />

This hierarchy is <strong>based</strong> on observed preferences for relativized NPs <strong>in</strong> a number <strong>of</strong><br />

languages. The explanation is that subjects are more obligatory for predicates than<br />

objects are, and therefore, are more predictable.<br />

L<strong>in</strong>, Fong, and Bever (2005) and L<strong>in</strong> and Bever (2006b) suggest that the subject<br />

position is higher up <strong>in</strong> the syntactic tree structure than the object position. They propose<br />

an <strong>in</strong>cremental m<strong>in</strong>imalist parser (IMP), that performs a top down search through<br />

the tree from the filler to the gap. The search starts at the head noun and proceeds<br />

downward, look<strong>in</strong>g for a c-commanded trace. S<strong>in</strong>ce subjects are higher <strong>in</strong> the tree, this<br />

mechanism makes sure that subject traces are always accessed first, irrespective <strong>of</strong> fillergap<br />

distances or specific word orders: “This top-down search<strong>in</strong>g mechanism overrides<br />

the effect <strong>of</strong> NP recency (i.e. l<strong>in</strong>ear locality), and passive complexity (i.e. canonicity).”<br />

(L<strong>in</strong> et al., 2005; p. 11). S<strong>in</strong>ce an Accessibility Theory is <strong>in</strong>dependent <strong>of</strong> word order and<br />

locality, it would predict cross-l<strong>in</strong>guistic preference for subject extractions.<br />

The top-down gap-search<strong>in</strong>g mechanism makes the same predictions for pre- and postnom<strong>in</strong>al<br />

RCs. Consequently a subject preference is also predicted for Mandar<strong>in</strong> RCs.<br />

The facilitation effect <strong>in</strong> subject extraction would occur on the head noun where the<br />

gap-search<strong>in</strong>g mechanism is <strong>in</strong>itiated. Support<strong>in</strong>g evidence for this structural account<br />

provides the PRC experiment by L<strong>in</strong> et al. (2005) reported above: the easiest condition<br />

was the one where the gap is <strong>in</strong> subject position. There was no significant difference on<br />

the head noun between the other two conditions.<br />

Perspective Shift<br />

A more pragmatic explanation provides the theory <strong>of</strong> perspective shift (MacWh<strong>in</strong>ney and<br />

Pleh, 1988; MacWh<strong>in</strong>ney, 1982; 1977). Somehow the comprehender preferably adopts<br />

the perspective <strong>of</strong> a sentential subject. Consequently, when the subject changes the<br />

comprehender has to shift his or her perspective. An object relative clause demands a<br />

perspective shift from the ma<strong>in</strong> clause subject to the RC’s subject and then back to the<br />

ma<strong>in</strong> clause subject after completion <strong>of</strong> the embedd<strong>in</strong>g-structure. In subject relatives<br />

the subject is the same for both clauses and hence no shift<strong>in</strong>g is required. Perspective<br />

shift<strong>in</strong>g demands process<strong>in</strong>g resources. That makes ORCs costlier to process because<br />

here two shifts are necessary and none <strong>in</strong> SRCs. A slowdown <strong>in</strong> ORC read<strong>in</strong>g is predicted<br />

on the embedded NP (first shift) and the ma<strong>in</strong> verb (second shift).<br />

The perspective shift<strong>in</strong>g account predicts a subject-preference <strong>in</strong> most languages with<br />

post-nom<strong>in</strong>al RCs. For pre-nom<strong>in</strong>al RCs <strong>in</strong> a language like Mandar<strong>in</strong> the pattern would<br />

change. In Mandar<strong>in</strong> ORCs the subject <strong>of</strong> the RC is <strong>in</strong> <strong>in</strong>itial position followed by a<br />

verb. See<strong>in</strong>g the head noun, a ma<strong>in</strong> clause is predicted. Thus one shift is necessary to<br />

change perspective from the embedded to the head noun, be<strong>in</strong>g the subject now. As for<br />

Mandar<strong>in</strong> SRCs, the locus <strong>of</strong> perspective is dependent on the predicted structure when<br />

read<strong>in</strong>g the ‘V N’ sequence. Predict<strong>in</strong>g a gapped structure, i.e., be<strong>in</strong>g aware <strong>of</strong> the SRC,<br />

would cause an expectation <strong>of</strong> a subject head noun. This would not require any shift<strong>in</strong>g.<br />

However, what happens when the gap is not recognized? Could perspective possibly<br />

31


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

center on the RC object which is the only NP available, wrongly <strong>in</strong>terpret<strong>in</strong>g it as an<br />

object? If so, read<strong>in</strong>g an SRC would also require one shift, namely from the embedded<br />

NP to the sentential subject. This makes SRCs numerically as hard as ORCs. In addition<br />

to the perspective shift <strong>in</strong> the SRC a reanalysis would be expected. The answers to the<br />

questions concern<strong>in</strong>g perspective shift <strong>in</strong> Ch<strong>in</strong>ese depends on the mechanism guid<strong>in</strong>g the<br />

reader’s perspective when subject is ambiguous or absent. To sum up, the perspective<br />

shift account could probably account for a subject advantage <strong>in</strong> Mandar<strong>in</strong>, but this is<br />

not clear. If so, an effect is expected on the head noun.<br />

2.3.6 Summary<br />

Table 2.3 shows an overview <strong>of</strong> the theories addressed here and their predictions regard<strong>in</strong>g<br />

English and Mandar<strong>in</strong> RCs. All mentioned theories agree on a subject preference<br />

for English. However, a heterogenous picture appears on the Mandar<strong>in</strong> side. There<br />

is a slight bias <strong>in</strong> the prediction pattern <strong>in</strong> favor <strong>of</strong> a subject preference. This would<br />

<strong>in</strong>tegrate nicely <strong>in</strong>to an otherwise universal consistency. Accessibility, Expectation, perspective<br />

shift, and pure RC type frequency predict a clear subject preference, whereas<br />

canonicity, Integration Cost, and Storage Cost under the Gap Assumption predict an<br />

ORC advantage. Storage Cost under the Elided Subject Assumption would predict a<br />

subject advantage on the RC region and an object advantage on the head noun. The<br />

predictions <strong>of</strong> the Active Filler Strategy are unclear. As for experience, due to the granularity<br />

problem the predictions are not clear. A connectionist implementation, as follows<br />

<strong>in</strong> chapter 4, is believed to make more specific predictions. Anticipat<strong>in</strong>g the results, the<br />

simulations predicted a weak ORC preference, which appeared, however, only at the<br />

relativizer. Account<strong>in</strong>g for the corpus data by Kuo and Vasishth (2007) even caused<br />

a subject preference <strong>in</strong> the RC region. To f<strong>in</strong>d out about the just discussed theories’<br />

compatibility with empirical data, the next section will report important studies on the<br />

subject/object difference <strong>in</strong> Ch<strong>in</strong>ese.<br />

2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />

Hsiao and Gibson, 2003<br />

The self-paced read<strong>in</strong>g study by Hsiao and Gibson (2003) was the first to report results<br />

address<strong>in</strong>g the subject/object difference <strong>in</strong> Ch<strong>in</strong>ese. It had great impact on the discussion<br />

about the universality <strong>of</strong> the subject preference across-languages because Ch<strong>in</strong>ese<br />

was the first exception discovered. Hsiao and Gibson studied s<strong>in</strong>gly and doubly embedded<br />

Mandar<strong>in</strong> relative clauses like the ones <strong>in</strong> examples (2.2) and (10) <strong>in</strong> a self-paced<br />

read<strong>in</strong>g task. For s<strong>in</strong>gle-embedd<strong>in</strong>g they found an advantage for ORCs on the region<br />

before the relativizer (N1 V1 / V1 N1). For the double-embedded RCs the relevant<br />

regions were the 3rd and 4th word (de1 N2 / N1 de1 ) and the 5th and 6th word (V2<br />

de2 / N2 de2 ). On both regions an object advantage was measured. Both s<strong>in</strong>gly and<br />

32


2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />

Theory E-S C-S C-O<br />

√ √<br />

Canonicity<br />

√ √<br />

IC<br />

√ √<br />

SC + Gap<br />

√ √<br />

Accessibility<br />

√ √<br />

Expectation<br />

√ √<br />

Perspective<br />

√ √<br />

RC Frequency<br />

√<br />

Active Filler<br />

? ?<br />

SC + Elided Subj. √ ( √ )<br />

√ √ √<br />

<strong>Experience</strong><br />

( ) ( )<br />

Table 2.3: Extraction preference predictions for English and Mandar<strong>in</strong> RCs. E-S =<br />

English subject preference. C-S and C-O = Ch<strong>in</strong>ese subject or object preference,<br />

respectively. IC = Integration Cost, SC = Storage Cost.<br />

doubly embedded RCs show an object preference. Accord<strong>in</strong>g to L<strong>in</strong> (2007) a garden<br />

path effect is expected <strong>in</strong> both RC types due to an <strong>in</strong>itial mis<strong>in</strong>terpretation as a ma<strong>in</strong><br />

clause. A reanalysis should take place at the relativizer/head noun region, especially <strong>in</strong><br />

the ORC, lead<strong>in</strong>g to higher read<strong>in</strong>g times. Interest<strong>in</strong>gly, Hsiao and Gibson’s data do not<br />

show such an effect. Nevertheless, Hsiao and Gibson argue for an <strong>in</strong>itial misanalysis <strong>in</strong><br />

the ORC but not <strong>in</strong> the SRC. Under these premises their result is consistent with the<br />

Storage Cost account when <strong>in</strong>terpreted <strong>in</strong> terms <strong>of</strong> the Gap Assumption: In the SRC<br />

more heads are predicted, s<strong>in</strong>ce an RC is expected, while <strong>in</strong> the ORC accord<strong>in</strong>g to a<br />

ma<strong>in</strong> clause <strong>in</strong>terpretation less heads are predicted. Hsiao and Gibson assume that no<br />

reanalysis is necessary because the already constructed ma<strong>in</strong> clause structure <strong>in</strong> the ORC<br />

does not have to be modified for attach<strong>in</strong>g it as a relative clause. Concern<strong>in</strong>g Integration<br />

Cost, the results do not support the theory. As there is no significant effect on the head<br />

noun, the results do not show the predicted Integration Cost due to a longer filler-gap<br />

distance <strong>in</strong> the SRC. A naive application <strong>of</strong> the canonicity account would def<strong>in</strong>itely fit<br />

the data. The more canonical ORC structure, which resembles the ma<strong>in</strong> clause word<br />

order, seems <strong>in</strong>deed easier to process. However, as Hsiao and Gibson state themselves,<br />

it rema<strong>in</strong>s unclear what canonicity and structural frequency accounts predict without<br />

hav<strong>in</strong>g reasonable evidence for what counts as canonical or what structural level feeds<br />

<strong>in</strong>to frequency calculations.<br />

There are five major issues <strong>in</strong> Hsiao and Gibson’s study that can be criticized.<br />

a) Animacy Hsiao and Gibson only used animate NPs <strong>in</strong> their stimuli. Kuo and<br />

Vasishth’s corpus study (reported <strong>in</strong> 2.2) revealed that hardly any RC-like structures<br />

actually conta<strong>in</strong> two animate NPs. In cases with an animate embedded noun the most<br />

33


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

frequent cont<strong>in</strong>uation conta<strong>in</strong>s an <strong>in</strong>animate head noun. Only 2% <strong>of</strong> the SRC-like structures<br />

(V N1 de N2) were SRCs with two animate nouns. Of the found ORC-like structures<br />

(N1 V1 de N2) about 39% were ORCs. About 93% <strong>of</strong> these <strong>in</strong>volved an <strong>in</strong>animate<br />

head noun, leav<strong>in</strong>g only 7% to the structures Hsiao and Gibson studied. This has two<br />

consequences. First, the stimuli used by Hsiao and Gibson (2003) are not natural relative<br />

clauses but rarely occurr<strong>in</strong>g constructions most readers never come across. This<br />

may cause a confound <strong>in</strong> the results. The second po<strong>in</strong>t is that the overall unexpected<br />

animacy <strong>of</strong> the head noun would <strong>in</strong>duce a surprisal effect at this position which has not<br />

been found by Hsiao and Gibson.<br />

b) Clause Type vs. Embedd<strong>in</strong>g Confound <strong>in</strong> Double-embedd<strong>in</strong>g As mentioned<br />

<strong>in</strong> 1.2 and 2.2, double-embedd<strong>in</strong>g <strong>in</strong>troduces special complexities which go beyond<br />

word order differences <strong>of</strong> s<strong>in</strong>gle embedd<strong>in</strong>g. In particular Mandar<strong>in</strong> SRC recursion produces<br />

a center-embedd<strong>in</strong>g structure while object-extracted embedd<strong>in</strong>g results <strong>in</strong> serial<br />

dependencies. This makes multiply embedded ORCs easier than SRCs. Hence discovered<br />

effects <strong>in</strong> multiple embedd<strong>in</strong>g might be due to the difference <strong>in</strong> complexity which<br />

is not existent <strong>in</strong> s<strong>in</strong>gle embedd<strong>in</strong>g.L<strong>in</strong> and Bever (2006b) <strong>in</strong>terpret this as a confound,<br />

question<strong>in</strong>g the contribution <strong>of</strong> double-embedd<strong>in</strong>g studies, whereas Kuo and Vasishth<br />

(2007) do not agree with that view.<br />

c) Gap assumption Hsiao and Gibson’s explanations for their results rest on the Gap<br />

Assumption. However, the evidence provided by the corpus study <strong>of</strong> Kuo and Vasishth<br />

makes the truth <strong>of</strong> that assumption appear unlikely. Relative clauses make only about<br />

20% <strong>of</strong> structural similar occurrences. As for corpus frequencies it seems that a gapped<br />

RC structure is the least probable to expect when read<strong>in</strong>g either <strong>of</strong> the two RC types.<br />

d) Syntactically Ambiguous Verbs L<strong>in</strong> and Bever (2006b) criticize Hsiao and Gibson’s<br />

study for us<strong>in</strong>g verbs that were unbalanced regard<strong>in</strong>g their syntactic arguments.<br />

In addition to direct objects 7 <strong>of</strong> the verbs took sentential complements and 13 took<br />

verbal complements.<br />

e) Inconclusive RC region L<strong>in</strong> and Bever (2006b) claim that the pre-relativizer<br />

region is <strong>in</strong>conclusive with respect to differences concern<strong>in</strong>g relative clauses because the<br />

reader is not aware <strong>of</strong> the RC at that po<strong>in</strong>t, yet. L<strong>in</strong> and Bever attribute the difficulty <strong>in</strong><br />

SRCs to the miss<strong>in</strong>g subject <strong>in</strong> an allegedly regular sentence. However, it is not so clear<br />

whether it is right to call this a confound or rather a possible explanation for process<strong>in</strong>g<br />

difficulties <strong>in</strong> the RC. Attribut<strong>in</strong>g specific effects to relative clause process<strong>in</strong>g does not<br />

require the reader to know what structure he or she is actually read<strong>in</strong>g.<br />

34


2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />

L<strong>in</strong> and Bever (2006a)<br />

Address<strong>in</strong>g the ambiguous verbs confound <strong>in</strong> Hsiao and Gibson (2003), L<strong>in</strong> and Bever<br />

(2006a) conducted a self-paced read<strong>in</strong>g experiment with verbs that only took nom<strong>in</strong>al<br />

objects. They controlled for RC type and also if the RC was modify<strong>in</strong>g the subject<br />

or object <strong>of</strong> the matrix clause. In all conditions there was a subject preference on the<br />

relativizer and the head noun. No effect was observed on the RC region. The result<br />

contradicts the study <strong>of</strong> Hsiao and Gibson, as the overall preference is the opposite.<br />

However, to be exact, it is not necessarily contradiction, s<strong>in</strong>ce the locations <strong>of</strong> the effects<br />

do not overlap <strong>in</strong> the two experiments. So, it may theoretically be possible that both<br />

results are consistent.<br />

L<strong>in</strong> and Garnsey (2007)<br />

The read<strong>in</strong>g time study by L<strong>in</strong> and Garnsey (2007) provided evidence that ORCs are<br />

easier to comprehend. In addition they showed that animacy <strong>in</strong>formation is an important<br />

factor <strong>in</strong> the comprehension process and is used very early by the reader. Their<br />

stimuli were Mandar<strong>in</strong> RCs with another noun follow<strong>in</strong>g the head noun. The head noun<br />

could optionally be omitted. When dropp<strong>in</strong>g the head noun, the second noun could<br />

ambiguously <strong>in</strong>terpreted as the head noun. The confusability <strong>of</strong> the two nouns was<br />

controlled for by animacy. The plausibility <strong>of</strong> an animate RC head noun compared to<br />

an <strong>in</strong>amimate one was <strong>in</strong>creased by semantic implications <strong>of</strong> the embedded verb. The<br />

results show that animacy <strong>in</strong>formation was used immediately to solve the ambiguity <strong>in</strong><br />

both conditions (with and without head noun). Subject extraction was more difficult<br />

than object extraction <strong>in</strong> all conditions. The conditions with miss<strong>in</strong>g head nouns were<br />

most difficult. When the nouns were confusable, the differences between subject and object<br />

extraction were also found <strong>in</strong> regions after the head noun po<strong>in</strong>t<strong>in</strong>g to an <strong>in</strong>teraction<br />

<strong>of</strong> RC type with similarity-baed <strong>in</strong>terference (Gordon et al., 2006).<br />

Kuo and Vasishth (2007)<br />

Kuo and Vasishth (2007) conducted a self-paced read<strong>in</strong>g experiment that used the s<strong>in</strong>gly<br />

embedded stimuli <strong>of</strong> Hsiao and Gibson (2003). Additionally, Kuo and Vasishth added<br />

two more conditions to further assess the validity <strong>of</strong> the Gap Assumption as assumed by<br />

Hsiao and Gibson (2003) and <strong>of</strong> the Storage Cost predictions. The two extra conditions<br />

are shown <strong>in</strong> example (2.4). In (14b) The ORC is fronted by the passivization marker<br />

bei, which is ungrammatical <strong>in</strong> front <strong>of</strong> a ma<strong>in</strong> clause out <strong>of</strong> context. Thus, <strong>in</strong>sert<strong>in</strong>g<br />

bei removed the ma<strong>in</strong> clause ambiguity <strong>in</strong> the ORC. On the other hand, <strong>in</strong>sert<strong>in</strong>g the<br />

demonstrative zheige pre-verbally <strong>in</strong> the SRC (example 14a) makes the subject gap<br />

obvious. The demonstrative <strong>in</strong> comb<strong>in</strong>ation with the verb raises the expectation for a<br />

relative clause because a noun is needed to fill the gap. Hence the possibility <strong>of</strong> the<br />

structure cont<strong>in</strong>u<strong>in</strong>g as a ma<strong>in</strong> clause with an elided subject is excluded.<br />

35


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

(14) a. Mandar<strong>in</strong> SRC with demonstrative:<br />

[Zheige [ei yaoq<strong>in</strong>g fuhao dei] guanyuani] x<strong>in</strong>huaibugui.<br />

This-CL <strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon has bad <strong>in</strong>tentions.’<br />

b. Passivized Mandar<strong>in</strong> ORC:<br />

[Bei fuhao yaoq<strong>in</strong>g ei dei] guanyuani x<strong>in</strong>huaibugui.<br />

bei tycoon <strong>in</strong>vite gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />

’The <strong>of</strong>ficial who the tycoon <strong>in</strong>vited has bad <strong>in</strong>tentions.’<br />

If the Gap Assumption is correct and the reader is aware <strong>of</strong> the gap <strong>in</strong> the SRC, no<br />

difference <strong>in</strong> read<strong>in</strong>g time is predicted between the demonstrative condition and the one<br />

without zheige. As for the ORC, <strong>in</strong>sert<strong>in</strong>g the passivization marker should <strong>in</strong>crease the<br />

difficulty <strong>in</strong> the RC region <strong>in</strong> case the Storage Cost predictions are correct. The reason<br />

is that more syntactic heads would be stored <strong>in</strong> memory when an RC is predicted and<br />

not a ma<strong>in</strong> clause as assumed by Hsiao and Gibson. In short, under Hsiao and Gibson’s<br />

assumptions the zheige condition should have no effect, whereas the bei condition would<br />

<strong>in</strong>crease the difficulty on the ORC. The results showed an overall subject preference<br />

<strong>in</strong> the total read<strong>in</strong>g times. The preference was ma<strong>in</strong>ly found on the relativizer and<br />

the head noun (de N2). No effect was found on the RC region before the relativizer.<br />

This is consistent with L<strong>in</strong> and Bever (2006a). On the relativizer (and on the preced<strong>in</strong>g<br />

region, but not significantly) the SRC condition without the determ<strong>in</strong>er (SR-no-det) was<br />

easier than the condition with the determ<strong>in</strong>er (SR-det). This is not consistent with the<br />

Gap Assumption but with the Elided Subject Assumption. The Storage Cost hypothesis<br />

under the Elided Subject Assumption would predict less syntactic heads <strong>in</strong> the SR-no-det<br />

condition, s<strong>in</strong>ce not an RC but a ma<strong>in</strong> clause is expected. This confirms the prediction<br />

<strong>of</strong> the corpus frequency data. The <strong>in</strong>itial mis<strong>in</strong>terpretation <strong>in</strong> the SR-no-det condition<br />

receives support by another f<strong>in</strong>d<strong>in</strong>g: On the regions follow<strong>in</strong>g the head noun an <strong>in</strong>creased<br />

difficulty was observed <strong>in</strong> the SR-no-det condition. This could be caused by a reanalysis<br />

due to the mis<strong>in</strong>terpretation as a ma<strong>in</strong> clause. A difference between the OR-bei and<br />

OR-no-bei conditions was only observed on the relativizer. In particular, the no-bei<br />

condition was harder. This po<strong>in</strong>ts to a reanalysis process due to <strong>in</strong>itially mis<strong>in</strong>terpret<strong>in</strong>g<br />

the ORC as ma<strong>in</strong> clause. In spite <strong>of</strong> that, no effects po<strong>in</strong>t<strong>in</strong>g to a misanalysis were<br />

observed <strong>in</strong> the pre-relativizer region.<br />

To summarize, the ORC data <strong>of</strong> Kuo and Vasishth’s study do not support the Storage<br />

Cost Account, s<strong>in</strong>ce be<strong>in</strong>g aware <strong>of</strong> an ORC follow<strong>in</strong>g did not <strong>in</strong>crease difficulty <strong>in</strong><br />

the RC region. However, evidence for an <strong>in</strong>itial mis<strong>in</strong>terpretation was found <strong>in</strong> the<br />

reanalysis effect on the relativizer. SRC data supports a Storage Cost Account only<br />

under the Elided Subject Assumption, mean<strong>in</strong>g that readers <strong>in</strong>terpret SRCs <strong>in</strong>itially as<br />

ma<strong>in</strong> clauses with an elided subject. Altogether a subject preference effect was shown<br />

which was located on the relativizer and head noun region like <strong>in</strong> L<strong>in</strong> and Bever (2006a).<br />

36


2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />

Qiao and Forster (2008)<br />

The divergent results from the studies discussed so far are not necessarily contradictory<br />

because no exactly opposite effects were found <strong>in</strong> comparable regions. Rather some<br />

studies found effects <strong>in</strong> places were others did not. Throughout the studies, a subject<br />

preference was only found at the relativizer and head noun region while an object preference<br />

was ma<strong>in</strong>ly observed on the region before the relativizer <strong>in</strong> the embedded RC. Qiao<br />

and Forster (2008) claim both f<strong>in</strong>d<strong>in</strong>gs to be consistent. They argue that the readers<br />

<strong>in</strong> Hsiao and Gibson’s and Kuo and Vasishth’s experiments adopted different strategies<br />

that lead to the contradict<strong>in</strong>g results. Follow<strong>in</strong>g Qiao and Forster two strategies are<br />

possible <strong>in</strong> SPR experiments: a) a “wait-and-see” strategy where readers do not commit<br />

to a specific structure <strong>in</strong> the early sentence and b) a more careful process<strong>in</strong>g <strong>of</strong> the RC.<br />

<strong>Effects</strong> that may be delayed <strong>in</strong> the first strategy would show up <strong>in</strong> the second. In particular,<br />

<strong>in</strong> the careful strategy the SRC structure should be recognized, caus<strong>in</strong>g higher<br />

difficulty. This would, on the other hand, decrease the difficulty at the relativizer and<br />

the head noun because no reanalysis takes place. Qiao and Forster used the Maze Task 2<br />

(Forster, Guerrera, and Elliot, 2008), which forced readers to adopt the more careful<br />

strategy. The results showed an object preference <strong>in</strong> the relative clause (as <strong>in</strong> Hsiao<br />

and Gibson, 2003) and a subject preference at the relativizer (consistent with Kuo and<br />

Vasishth, 2007). Overall a slight advantage for the ORC was found. The explanation<br />

<strong>of</strong>fered by Qiao and Forster is that <strong>in</strong> <strong>in</strong> Kuo and Vasishth’s study readers adopted<br />

the wait-and-see strategy which avoided the difficulties <strong>in</strong> the SRC. This means that<br />

readers do not really predict a ma<strong>in</strong> clause <strong>in</strong> the onl<strong>in</strong>e-read<strong>in</strong>g <strong>of</strong> an SRC as stated<br />

by the Elided Subject Assumption; readers rather do not make any prediction at all.<br />

Consider<strong>in</strong>g that the subject preference on the relativizer was not significant by items,<br />

the results are consistent with Hsiao and Gibson’s data. It is, however, not clear why<br />

the participants <strong>in</strong> Hsiao and Gibson’s study adopted a different pars<strong>in</strong>g strategy than<br />

the readers <strong>in</strong> Kuo and Vasishth’s study, when both used the same method.<br />

Summary<br />

Many issues <strong>of</strong> Hsiao and Gibson’s <strong>in</strong>itial study were subject to critique. However,<br />

subsequent results were equally distributed between object and subject preference. See<br />

table 2.4 for a summary <strong>of</strong> the results ga<strong>in</strong>ed for the RC extraction preference <strong>in</strong> Ch<strong>in</strong>ese<br />

Mandar<strong>in</strong>. The table also shows which <strong>of</strong> the discussed theories are consistent with the<br />

studies. Storage Cost under the Elided Subject Assumption (SC+ES) is not consistent<br />

with any study. Recall that the theory would predict a no effect <strong>in</strong> the RC region and an<br />

object advantage on the relativizer and head noun. This does not fit any <strong>of</strong> the empirical<br />

f<strong>in</strong>d<strong>in</strong>gs. The Active Filler Strategy is omitted <strong>in</strong> the table because it does not make<br />

2 In the Maze Task the reader has to choose between two words at each po<strong>in</strong>t <strong>in</strong> the sentence. Only<br />

one <strong>of</strong> the two words is a grammatical cont<strong>in</strong>uation. The reader is thus forced to predict a complete<br />

structure.<br />

37


clear predictions for head-f<strong>in</strong>al RCs.<br />

Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

HG03 LB06 LG07 KV07 QF08<br />

Preference O S O S O (S)<br />

Region<br />

Canonicity<br />

IC<br />

SC+GA<br />

Access.<br />

Expectation<br />

Persp.+ES<br />

Frequency<br />

RC<br />

√<br />

√<br />

√<br />

de N<br />

√<br />

√<br />

√<br />

√<br />

RC<br />

√<br />

√<br />

√<br />

de N<br />

√<br />

√<br />

√<br />

√<br />

RC (de)<br />

√<br />

√<br />

√<br />

√<br />

SC+ES<br />

<strong>Experience</strong> ( √ ) ( √ ) ( √ )<br />

Table 2.4: Studies <strong>of</strong> the RC extraction preference <strong>in</strong> Mandar<strong>in</strong> and their consistency<br />

with discussed theories. HG03 = Hsiao and Gibson (2003), LB06 = L<strong>in</strong> and Bever<br />

(2006a), LG07 = L<strong>in</strong> and Garnsey (2007), KV07 = Kuo and Vasishth (2007), and<br />

QF08 = Qiao and Forster (2008). IC = Integration Cost, SC = Storage Cost, GA =<br />

Gap Assumption, and ES = Elided Subject Assumption.<br />

I turn now to the second phenomenon to be addressed <strong>in</strong> this thesis: <strong>Effects</strong> <strong>of</strong> forgett<strong>in</strong>g<br />

while the process<strong>in</strong>g <strong>of</strong> complex nested structures.<br />

2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />

2.5.1 The Grammaticality Illusion<br />

Complex nested structures like center-embedd<strong>in</strong>g relative clauses are very difficult to<br />

process. Grammaticality rat<strong>in</strong>g studies show that these structures are <strong>of</strong>ten judged as<br />

ungrammatical. Memory-<strong>based</strong> theories (Gibson, 1998; 2000; Just and Carpenter, 1992;<br />

Lewis and Vasishth, 2005; Lewis et al., 2006) expla<strong>in</strong> this by the excessive capacity load<br />

evoked by a number <strong>of</strong> unbounded dependencies that have to be held <strong>in</strong> memory. The<br />

DLT (Gibson, 2000) predicts pars<strong>in</strong>g slow-downs due to storage <strong>of</strong> complex predictions<br />

and decay processes <strong>in</strong> distant dependencies. Capacity limitations are commonly seen<br />

as cross-l<strong>in</strong>guistic constra<strong>in</strong>ts that underly all sorts <strong>of</strong> language process<strong>in</strong>g. Hence the<br />

predictions <strong>of</strong> memory-<strong>based</strong> theories are language-<strong>in</strong>dependent. However, a study by<br />

Vasishth et al. (2008) casts doubt on that claim’s validity. Their experiment suggests<br />

that the robustness <strong>of</strong> memorized representations and related decay effects may well be<br />

dependent upon language-specific grammatical properties. The experiment concerned<br />

38


2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />

the so-called grammaticality illusion <strong>of</strong> ungrammatical center-embedd<strong>in</strong>g structures.<br />

Example (15) shows a sentence pair discussed <strong>in</strong> Frazier (1985). (15a) is a grammatical<br />

sentence conta<strong>in</strong><strong>in</strong>g a doubly embedded ORC. The center-embedd<strong>in</strong>g produces<br />

three consecutive verb phrases (VPs) complet<strong>in</strong>g the three clauses from the <strong>in</strong>nermost<br />

to the outermost. In (15b) the second verb phrase was clean<strong>in</strong>g every week is dropped,<br />

what makes the sentence ungrammatical. I will call the condition <strong>in</strong> (15b) the drop-V2<br />

condition.<br />

(15) a. The apartment that the maid who the service had sent over was clean<strong>in</strong>g<br />

every week was well decorated.<br />

b. * The apartment that the maid who the service had sent over was well<br />

decorated. 3<br />

The surpris<strong>in</strong>g observation (attributed to Janet Fodor) was that the ungrammatical<br />

sentence (15a) to English readers does not only appear grammatical; but most readers<br />

judge it even better than the grammatical version <strong>of</strong> the sentence. The f<strong>in</strong>d<strong>in</strong>g ga<strong>in</strong>ed<br />

support by an acceptability rat<strong>in</strong>g study by Gibson and Thomas (1999). For their<br />

study Gibson and Thomas used stimuli that either conta<strong>in</strong>ed all three or missed one<br />

<strong>of</strong> the three VPs. The results showed that miss<strong>in</strong>g out the second VP causes readers<br />

to rate the sentence as acceptable as the grammatical one conta<strong>in</strong><strong>in</strong>g all VPs. A further<br />

rat<strong>in</strong>g experiment conducted by Christiansen and MacDonald (1999) even showed<br />

a higher acceptability for the drop-V2 condition than for the grammatical condition.<br />

The qualitative difference to Gibson and Thomas’ study is expla<strong>in</strong>able by the method<br />

Christiansen and MacDonald used. They carried out a so-called “stop-mak<strong>in</strong>g-sense”<br />

task, which is self-paced word-by-word read<strong>in</strong>g with periodic request for a grammaticality<br />

rat<strong>in</strong>g. The SPR task prevents the participants from re-read<strong>in</strong>g the sentence. This<br />

k<strong>in</strong>d <strong>of</strong> quasi-onl<strong>in</strong>e measure may be the cause for a lower rat<strong>in</strong>g <strong>of</strong> the grammatical<br />

but complex center-embedd<strong>in</strong>g. As an explanation for the grammaticality illusion Gibson<br />

and Thomas (1999) propose that the high memory load causes the reader to forget<br />

the second NP (the maid) and with it the prediction <strong>of</strong> the second VP (was clean<strong>in</strong>g<br />

every week). Gibson and Thomas basically <strong>of</strong>fer two hypotheses: a) the high memory<br />

cost prun<strong>in</strong>g hypothesis and b) the recency/primacy account. The two approaches<br />

are restated by Vasishth et al. (2008) as the VP-forgett<strong>in</strong>g Hypothesis and the<br />

NP-forgett<strong>in</strong>g Hypothesis, respectively.<br />

a) The VP-forgett<strong>in</strong>g Hypothesis The orig<strong>in</strong>al High Memory Cost Prun<strong>in</strong>g Hypothesis<br />

rests on the assumptions <strong>of</strong> SPLT (Gibson, 1998), the predecessor <strong>of</strong> DLT. The<br />

major proposition as stated by Gibson and Thomas (1999) is the follow<strong>in</strong>g:<br />

(16) The high memory cost prun<strong>in</strong>g hypothesis:<br />

At po<strong>in</strong>ts <strong>of</strong> high memory complexity, forget the syntactic prediction(s) with the<br />

most memory load.<br />

3 By convention an asterisk (*) <strong>in</strong>dicates ungrammaticality <strong>of</strong> a sentence.<br />

39


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

Accord<strong>in</strong>g to Gibson and Thomas, exceed<strong>in</strong>g a theoretic memory capacity limit by excessive<br />

load causes a loss <strong>of</strong> costly predictions. A successful parse is possible as long as<br />

memory demands throughout the sentence stay with<strong>in</strong> a certa<strong>in</strong> capacity range. However,<br />

when high complexity causes the load to exceed the limit, a breakdown <strong>of</strong> the<br />

parser has to be prevented by prun<strong>in</strong>g activation. In the sense <strong>of</strong> the discrete nature <strong>of</strong><br />

SPLT this means that the prediction <strong>of</strong> certa<strong>in</strong> syntactic categories have to be dropped.<br />

The prun<strong>in</strong>g hypothesis assumes that the predictions to be forgotten are those caus<strong>in</strong>g<br />

the biggest part <strong>of</strong> SPLT memory cost at the current po<strong>in</strong>t <strong>in</strong> the sentence. In example<br />

(17) the po<strong>in</strong>t <strong>of</strong> highest memory cost is the deepest embedded subject the cl<strong>in</strong>ic (NP3).<br />

At this po<strong>in</strong>t two predictions are held <strong>in</strong> memory: VP2 predicted by NP2 and VP3<br />

predicted by NP3. S<strong>in</strong>ce VP2 is further up <strong>in</strong> the sentence and has to be held longer<br />

<strong>in</strong> memory than the successive VP3, it causes more memory cost. Consequently, the<br />

prediction <strong>of</strong> the second VP gets pruned and therewith forgotten.<br />

(17) a. [The patient]NP1 whoi [the nurse]NP2 whoj [the cl<strong>in</strong>ic]NP3 [had hired ej ]VP3<br />

[admitted ei]VP2 [met Jack]VP1 .<br />

b. * [The patient]NP1 whoi [the nurse]NP2 whoj [the cl<strong>in</strong>ic]NP3 [had hired ej ]VP3<br />

[met Jack]VP1 .<br />

Vasishth et al. (2008) restate the prun<strong>in</strong>g hypothesis <strong>in</strong> terms <strong>of</strong> decay as def<strong>in</strong>ed <strong>in</strong><br />

the DLT (Gibson, 2000) and refer to it as the VP-forgett<strong>in</strong>g Hypothesis. Vasishth et al.<br />

calculate Integration and Storage Cost at the three VPs to determ<strong>in</strong>e the “po<strong>in</strong>t <strong>of</strong> greatest<br />

difficulty” <strong>in</strong> the sentence. The DLT cost predictions for example (17) are illustrated<br />

<strong>in</strong> figure 2.5. At the first VP (VP3) two <strong>in</strong>tegrations take place. The object the nurse<br />

with two <strong>in</strong>terven<strong>in</strong>g discourse referents (cl<strong>in</strong>ic and hired) and the subject the cl<strong>in</strong>ic with<br />

one <strong>in</strong>terven<strong>in</strong>g discourse referent (hired) are <strong>in</strong>tegrated. At this moment there are two<br />

active predictions held <strong>in</strong> memory: the predicate <strong>of</strong> the upper RC (admitted), caused<br />

by read<strong>in</strong>g nurse, and the ma<strong>in</strong> verb. This makes a total cost <strong>of</strong> 4. At the second verb<br />

(admitted) the object the patient and the subject the nurse are <strong>in</strong>tegrated. The patient<br />

has a distance <strong>of</strong> four discourse referents (nurse, cl<strong>in</strong>ic, hired, and admitted) from the<br />

verb, the object nurse is separated by two, and just the matrix verb is predicted. This<br />

makes a total memory cost <strong>of</strong> 8 at the VP2 site. F<strong>in</strong>ally, on the third VP, by <strong>in</strong>tegrat<strong>in</strong>g<br />

the patient and predict<strong>in</strong>g a direct object, a cost <strong>of</strong> 6 is ga<strong>in</strong>ed. Conclud<strong>in</strong>g from the<br />

calculations, VP2 has the highest memory cost and, hence, is forgotten.<br />

The difference between Vasishth et al.’s and Gibson and Thomas’ account is that<br />

the latter added Storage Cost on the noun and Integration Cost <strong>of</strong> the predicted verb,<br />

whereas Vasishth et al. just use the total cost on the verb. The predictions, however, are<br />

the same. Let me try to reformulate the decay approach more <strong>in</strong>tuitively. The important<br />

measure <strong>of</strong> the decay approach is Integration Cost. By count<strong>in</strong>g the number <strong>of</strong> <strong>in</strong>terven<strong>in</strong>g<br />

discourse referents it is a discrete <strong>in</strong>direct measure <strong>of</strong> time. Or, as Vasishth et al. put<br />

it: it is “a discretized abstraction over some activation decay function that determ<strong>in</strong>es<br />

the strength <strong>of</strong> a memorial representation.” Hence, decay could be described as a function<br />

<strong>of</strong> time and <strong>in</strong>terven<strong>in</strong>g memory load with the assumption that a high memory load<br />

40


2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />

The patient who the nurse who the cl<strong>in</strong>ic. . .<br />

had hired admitted met Jack.<br />

C(I) 2+1 4+3 5<br />

C(S) 2 1 1<br />

Total 4 8 6<br />

Figure 2.5: DLT memory cost for the three VPs <strong>in</strong> a doubly embedded ORC.<br />

<strong>in</strong>creases the speed <strong>of</strong> decay. In our example, due to non-<strong>in</strong>tegrated discourse referents<br />

high memory load appears after the source <strong>of</strong> the VP2 prediction (nurse). That results<br />

<strong>in</strong> a steeper slope <strong>of</strong> the decay function, caus<strong>in</strong>g the representation <strong>of</strong> the VP2 prediction<br />

to fall below a certa<strong>in</strong> threshold. The forgett<strong>in</strong>g <strong>of</strong> the VP2 prediction would account<br />

for the good rat<strong>in</strong>g <strong>of</strong> the ungrammatical condition <strong>in</strong> the follow<strong>in</strong>g way. The miss<strong>in</strong>g<br />

<strong>of</strong> the VP <strong>in</strong> the drop-V2 condition stays unnoticed and causes no surprise. In addition<br />

the distance between VP1 and its dependent is smaller, which facilitates retrieval. In<br />

the grammatical condition, on the other hand, the occurrence <strong>of</strong> the unpredicted VP2<br />

causes pars<strong>in</strong>g failure. Compar<strong>in</strong>g both conditions <strong>in</strong> a read<strong>in</strong>g time study should show<br />

differences at the matrix verb and the follow<strong>in</strong>g region.<br />

b) The NP-forgett<strong>in</strong>g Hypothesis Gibson and Thomas (1999) and Vasishth et al.<br />

(2008) additionally mention a possible serial order effect on ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g several NPs <strong>in</strong><br />

memory. Evidence from cross-doma<strong>in</strong> studies on human short-term memory (e.g. Henson,<br />

1998; Baddeley, 1997; Lewis, 1996) shows a recency/primary preference, mak<strong>in</strong>g<br />

most recent and earliest items easier to recall. This suggests that the representational<br />

strength <strong>of</strong> memorized items exhibits a U-shaped pattern with respect to their serial<br />

order, mak<strong>in</strong>g middle items harder to ma<strong>in</strong>ta<strong>in</strong> than the rest. Assum<strong>in</strong>g that a recency/primacy<br />

preference applies to the memoriz<strong>in</strong>g <strong>of</strong> noun phrases, this account leads to<br />

the NP-forgett<strong>in</strong>g Hypothesis as follows. High memory load causes the lowest activated<br />

middle NP (NP2) to be forgotten. This results <strong>in</strong> retrieval failure at VP2 <strong>in</strong> the grammatical<br />

condition, whereas <strong>in</strong> the drop-V2 condition no retrieval can be triggered, s<strong>in</strong>ce<br />

the respective verb is miss<strong>in</strong>g. A a result the the grammatical sentence is perceived more<br />

difficult. The effects <strong>of</strong> the NP-forgett<strong>in</strong>g Hypothesis should also occur on the matrix<br />

verb and beyond.<br />

English<br />

Vasishth et al. (2008) assessed NP-forgett<strong>in</strong>g and VP-forgett<strong>in</strong>g measur<strong>in</strong>g onl<strong>in</strong>e read<strong>in</strong>g<br />

time <strong>in</strong> SPR and eyetrack<strong>in</strong>g. Besides the grammaticality control NP-similarity contrasts<br />

like <strong>in</strong> example (18) were used to detect NP-forgett<strong>in</strong>g effects. In (18a) all three NPs<br />

are highly confusable whereas <strong>in</strong> (18b) the second noun is <strong>in</strong>animate, which reduces the<br />

similarity. Follow<strong>in</strong>g Vasishth et al., high similarity (18a) predicts encod<strong>in</strong>g <strong>in</strong>terference<br />

41


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

at NP3 and storage and retrieval <strong>in</strong>terference at VP3. When the NP2 is not forgotten,<br />

<strong>in</strong>terference effects should also be seen at the rest <strong>of</strong> the sentence <strong>in</strong> the high-<strong>in</strong>terference<br />

condition. However, assum<strong>in</strong>g that the representation <strong>of</strong> the second NP has decayed<br />

latest at the middle verb, no further <strong>in</strong>terference effect should occur. Consequently,<br />

the NP-forgett<strong>in</strong>g Hypothesis predicts that differences between the high <strong>in</strong>terference<br />

condition (18a) and the low <strong>in</strong>terference condition (18b) should disappear after the first<br />

verb.<br />

(18) a. The carpenter who the craftsman that the peasant carried hurt supervised<br />

the apprentice. (high-<strong>in</strong>terference)<br />

b. The carpenter who the pillar that the peasant carried hurt supervised the<br />

apprentice. (low-<strong>in</strong>terference)<br />

The results <strong>of</strong> English SPR and eyetrack<strong>in</strong>g showed an exist<strong>in</strong>g but hardly significant<br />

support for the NP-forgett<strong>in</strong>g Hypothesis. Although be<strong>in</strong>g non-significant, there was a<br />

clear numerical read<strong>in</strong>g time effect <strong>of</strong> similarity-<strong>based</strong> <strong>in</strong>terference, which disappeared at<br />

V2, po<strong>in</strong>t<strong>in</strong>g to a forgett<strong>in</strong>g <strong>of</strong> NP2, which reduced the <strong>in</strong>terference. The VP-forgett<strong>in</strong>g<br />

Hypothesis was fully confirmed, as can be seen <strong>in</strong> figure 2.6. The drop-V2 condition was<br />

significantly faster at the ma<strong>in</strong> verb and the follow<strong>in</strong>g region, suggest<strong>in</strong>g a forgett<strong>in</strong>g <strong>of</strong><br />

the VP2 prediction and possibly additional difficulty at the ma<strong>in</strong> verb <strong>in</strong> the grammatical<br />

condition. In eyetrack<strong>in</strong>g there was also a surpris<strong>in</strong>g drop-V2 facilitation effect on the<br />

first verb (V3) not supported by the forgett<strong>in</strong>g hypothesis. This, however, is expla<strong>in</strong>ed<br />

by Vasishth and colleagues as an artifact <strong>of</strong> complexity-<strong>in</strong>duced re-read<strong>in</strong>g behavior <strong>in</strong><br />

the grammatical condition.<br />

German<br />

Identical experiments as laid out above were also carried out by Vasishth et al. (2008)<br />

<strong>in</strong> German. An example stimuli pair for the grammaticality manipulation is shown <strong>in</strong><br />

example (19). The result<strong>in</strong>g structure <strong>of</strong> German ORC double-embedd<strong>in</strong>g is identical to<br />

the one <strong>in</strong> English except for the commas. The comma issue will be addressed after the<br />

study results for German have been presented. The <strong>in</strong>vestigation <strong>of</strong> the NP-forgett<strong>in</strong>g<br />

Hypothesis yielded analogical results to the English study. Surpris<strong>in</strong>gly however, the<br />

VP-forgett<strong>in</strong>g-Hypothesis was not confirmed. On the contrary it was the grammatical<br />

condition that showed faster read<strong>in</strong>g time at V1 and post-V1 <strong>in</strong> both SPR and eyetrack<strong>in</strong>g.<br />

The eyetrack<strong>in</strong>g results for the German grammaticality manipulation are shown <strong>in</strong><br />

figure 2.7.<br />

(19) a. Der Anwalt, den der Zeuge, den der Spion betrachtete, schnitt, überzeugte<br />

den Richter. (grammatical)<br />

b. Der Anwalt, den der Zeuge, den der Spion betrachtete, überzeugte den<br />

Richter. (drop-V2)<br />

42


2.5 Forgett<strong>in</strong>g <strong>Effects</strong> SHORT-TERM FORGETTING 18<br />

Read<strong>in</strong>g time [ms]<br />

400 600 800 1000 1200 1400<br />

Experiment 2 (English Eyetrack<strong>in</strong>g): grammaticality<br />

grammatical<br />

ungrammatical<br />

V3 V2 V1 Post!V1<br />

Region<br />

Figure Figure 6. 2.6: Mean Effect read<strong>in</strong>g <strong>of</strong> the times grammaticality and 95% confidence manipulation <strong>in</strong>tervals for <strong>in</strong>the theverb English and post-verbal eyetrack<strong>in</strong>g regions study<br />

<strong>in</strong>by theVasishth English eyetrack<strong>in</strong>g et al. (2008) study (experiment 2). 2, The p. 18): figureMean shows read<strong>in</strong>g the effecttimes <strong>of</strong> the grammaticality<br />

and 95% confi-<br />

manipulation. dence <strong>in</strong>tervals for the verbs and post-verbal regions.<br />

addition, German<strong>in</strong>readers the SPRdodata not seem we found to forget weak the evidence VP prediction. consistent with In fact thethey NP-forgett<strong>in</strong>g seem to notice hy-<br />

the pothesis, ungrammaticality but <strong>in</strong> the eyetrack<strong>in</strong>g <strong>of</strong> the drop-V2 data wecondition, found stronger which evidence leads to for<strong>in</strong>creased NP forgett<strong>in</strong>g. read<strong>in</strong>g time.<br />

The surpris<strong>in</strong>g result po<strong>in</strong>ts to the assumption that l<strong>in</strong>guistic memory processes are not<br />

One surpris<strong>in</strong>g result <strong>in</strong> the eyetrack<strong>in</strong>g data was the shorter re-read<strong>in</strong>g time at V3<br />

language-<strong>in</strong>dependent but rather affected by language-specific grammatical properties.<br />

<strong>in</strong> the ungrammatical condition. However, this effect can be expla<strong>in</strong>ed. Re-read<strong>in</strong>g time<br />

The head-f<strong>in</strong>ite nature <strong>of</strong> German (SOV) subord<strong>in</strong>ate clauses causes verbs to appear<br />

is a function <strong>of</strong> revisits or regressions to regions that have already been viewed dur<strong>in</strong>g the<br />

clause-f<strong>in</strong>ally more frequently than it is the case <strong>in</strong> English, an SVO language. An ob-<br />

first pass. Given that regressions are more frequent <strong>in</strong> complex sentences (where complexity<br />

jection<br />

is def<strong>in</strong>ed<br />

might<br />

as <strong>in</strong>creased<br />

be, that commas<br />

ambiguity<br />

<strong>in</strong><br />

(Clifton<br />

German<br />

et<br />

facilitate<br />

al., <strong>in</strong> press)<br />

the recognition<br />

or any other<br />

<strong>of</strong><br />

k<strong>in</strong>d<br />

a completed<br />

<strong>of</strong> <strong>in</strong>tegration<br />

clause.<br />

Adifficulty), double-embedd<strong>in</strong>g and given that <strong>in</strong>volves the ungrammatical a comma aftersentences each embedded are predicted ma<strong>in</strong> to verb. be less Vasishth complex and<br />

colleagues overall, it addressed is not surpris<strong>in</strong>g this issue that byre-read<strong>in</strong>g a fifth experiment time is shorter <strong>in</strong>volv<strong>in</strong>g at V3 English <strong>in</strong> thesentences ungrammatical enriched<br />

with condition. commas. However, the comma <strong>in</strong>clusion did not show any effect. Nevertheless, as<br />

Vasishth et al. note, this result does not exclude the possibility <strong>of</strong> a comma-<strong>based</strong> facil-<br />

Another surpris<strong>in</strong>g result <strong>in</strong> the eyetrack<strong>in</strong>g data was the presence <strong>of</strong> the <strong>in</strong>terference<br />

itation. An important fact is that German readers are tra<strong>in</strong>ed on us<strong>in</strong>g commas, while<br />

effect <strong>in</strong> the Post-V1 region. If the second NP is forgotten by the time that V1 is processed,<br />

English readers are not, which suggests that commas were <strong>of</strong> no use for the English<br />

then it should not cause any <strong>in</strong>terference <strong>in</strong> subsequent regions. There are two possible<br />

participants.<br />

<strong>in</strong>terpretations <strong>of</strong> this pattern. One is that the second NP is not <strong>in</strong> fact forgotten, and<br />

the<br />

I will<br />

second<br />

now<br />

is<br />

<strong>in</strong>vestigate<br />

that the reappearance<br />

potential explanations<br />

<strong>of</strong> the <strong>in</strong>terference<br />

for a language-specific<br />

effect is an artefact<br />

forgett<strong>in</strong>g<br />

<strong>of</strong> process<strong>in</strong>g<br />

effect.<br />

difficulty dur<strong>in</strong>g earlier process<strong>in</strong>g. We defer discussion <strong>of</strong> this question until f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong> the<br />

rema<strong>in</strong><strong>in</strong>g experiments are presented.<br />

We turn our attention next to the German experiments.<br />

43


Read<strong>in</strong>g time [ms]<br />

400 600 800 1000 1200<br />

Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

SHORT-TERM FORGETTING 24<br />

Experiment 4 (German Eyetrack<strong>in</strong>g): grammaticality<br />

grammatical<br />

ungrammatical<br />

V3 V2 V1 Post!V1<br />

Region<br />

Figure 10. 2.7: Mean Effect read<strong>in</strong>g <strong>of</strong> the times grammaticality and 95% confidence manipulation <strong>in</strong>tervals for<strong>in</strong>the the critical German regions eyetrack<strong>in</strong>g <strong>in</strong> the German study<br />

eyetrack<strong>in</strong>g by Vasishth study et(experiment al. (2008) 4). (experiment The figure shows 4, p. the 24): effect Mean <strong>of</strong> grammaticality. read<strong>in</strong>g times and 95% confidence<br />

<strong>in</strong>tervals for the verbs and post-verbal regions.<br />

As <strong>in</strong> the German SPR study, the matrix verb V1 <strong>in</strong> the ungrammatical condition<br />

had longer read<strong>in</strong>g time than V1 <strong>in</strong> the grammatical condition. The read<strong>in</strong>g time at the<br />

2.5.2 word follow<strong>in</strong>g Expla<strong>in</strong><strong>in</strong>g V1 showed the no significant Forgett<strong>in</strong>g difference. Effect This result is <strong>in</strong>consistent with the VP-<br />

Capacity forgett<strong>in</strong>g hypothesis and consistent with the hypothesis that the middle verb’s prediction is<br />

not forgotten. Not much evidence was found for the NP-forgett<strong>in</strong>g evidence: no <strong>in</strong>terference<br />

Just effectand wasCarpenter seen <strong>in</strong> the (1992) regions explicitly preced<strong>in</strong>g mention V1, andthe atpossibility V1 the high-<strong>in</strong>terference <strong>of</strong> forgett<strong>in</strong>g certa<strong>in</strong> condition predictions<br />

showed, surpris<strong>in</strong>gly, <strong>in</strong> the CC-READER a lower read<strong>in</strong>g model time as compared “Forgett<strong>in</strong>g to by the displacement”. low-<strong>in</strong>terference condition. The underly<strong>in</strong>g<br />

mechanism Hav<strong>in</strong>gispresented equal to the prun<strong>in</strong>g experimental hypothesis results, <strong>of</strong> weGibson turn toand the Thomas <strong>in</strong>terpretation (1999), <strong>of</strong> the thusf<strong>in</strong>d mak<strong>in</strong>g<strong>in</strong>gs.<br />

basically the same predictions as the DLT-<strong>based</strong> approach described above. As has<br />

just been laid out, the VP-forgett<strong>in</strong>g Hypothesis is seem<strong>in</strong>gly not cross-l<strong>in</strong>guistically<br />

valid. Specifically, the hypothesis<br />

General<br />

has been<br />

Discussion<br />

confirmed for English but disconfirmed for<br />

German. We discuss An memory-<strong>based</strong> the NP- and explanation VP-forgett<strong>in</strong>gcould hypotheses account separately, for the language-specific beg<strong>in</strong>n<strong>in</strong>g with the difference<br />

former. <strong>in</strong>In two the ways: Englisheither experiments by postulat<strong>in</strong>g 1 and 1a awe language-dependent found no evidence consistent capacity with limitthe or a<br />

language-dependent predictions <strong>of</strong> NP-forgett<strong>in</strong>g, robustness whereas <strong>of</strong> VP <strong>in</strong> the predictions. English eyetrack<strong>in</strong>g However, study theories (experiment like DLT2) regard and<br />

memory the German processes SPR study as universally (experimentvalid. 3) we Thus, did f<strong>in</strong>dconsider<strong>in</strong>g the predicted aneffects. applicability However, <strong>of</strong>evidence the DLT<strong>based</strong><br />

<strong>in</strong>consistent hypothesis with NP <strong>in</strong> both forgett<strong>in</strong>g languages was also could found: only<strong>in</strong> mean the German SPRreaders study possess (experiment a higher 3)<br />

memory a shorter capacity read<strong>in</strong>g time thanwas English found <strong>in</strong>readers. the Post-V1 However, region there for theshould high-<strong>in</strong>terference be evidence condition, for that<br />

capacity and <strong>in</strong> the difference eyetrack<strong>in</strong>g fromstudy read<strong>in</strong>g (experiment span tasks 4) aand, shorter consider<strong>in</strong>g re-read<strong>in</strong>g work<strong>in</strong>g time was memory found capacity <strong>in</strong> the<br />

as<br />

V1<br />

doma<strong>in</strong>-unspecific,<br />

region for the high-<strong>in</strong>terference<br />

there should<br />

condition<br />

also be<br />

(compared<br />

evidence <strong>in</strong><br />

to<br />

non-l<strong>in</strong>guistic<br />

the low-<strong>in</strong>terference<br />

work<strong>in</strong>g<br />

condition).<br />

memory-<br />

Given these mixed results, it is impossible to make a strong case for NP-forgett<strong>in</strong>g as the<br />

sole explanation for the forgett<strong>in</strong>g effect, particularly s<strong>in</strong>ce VP forgett<strong>in</strong>g (discussed next)<br />

44


2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />

related tasks. S<strong>in</strong>ce there is no such evidence for a language specific memory span, a<br />

pure DLT-<strong>based</strong> hypothesis cannot account for the non-existence <strong>of</strong> the forgett<strong>in</strong>g effect<br />

<strong>in</strong> German. There have to be additional factors that affect the robustness <strong>of</strong> the VP2<br />

prediction representation.<br />

The most promis<strong>in</strong>g explanation is that process<strong>in</strong>g is affected by certa<strong>in</strong> languagespecific<br />

grammatical properties. Vasishth et al. (2008) mention two possibilities how this<br />

could be come about. a) The robustness <strong>of</strong> the verb representation is directly specified by<br />

the same parameters that shape the grammar and hence the production-<strong>based</strong> corpus<br />

regularities. b) On the other hand, the more robust representation could be due to<br />

a more effective process<strong>in</strong>g caused by read<strong>in</strong>g skill, which is affected by the mentioned<br />

corpus regularities and not by the parameters directly. The first possibility is matched by<br />

an expectation-<strong>based</strong> account that directly depends on grammatical properties. Also a<br />

canonicity account would predict SOV structures to be easier <strong>in</strong> German than <strong>in</strong> English.<br />

The alternative <strong>of</strong> read<strong>in</strong>g skill is accounted for by an experience-<strong>based</strong> approach.<br />

Expectation<br />

A door to language-specific effects could be antilocality. Antilocality has been observed<br />

predom<strong>in</strong>antly <strong>in</strong> head-f<strong>in</strong>al languages like German (Konieczny, 2000) and H<strong>in</strong>di (Vasishth<br />

and Lewis, 2006b). The seem<strong>in</strong>g restrictedness <strong>of</strong> these effects to head-f<strong>in</strong>al languages<br />

has lead to the suggestion that the sentence-f<strong>in</strong>al verb <strong>in</strong> these languages is<br />

higher expected than <strong>in</strong> non-head-f<strong>in</strong>al languages. However, a recent study by Jaeger<br />

et al. (2008) shows antilocality effects <strong>in</strong> English, which is not head-f<strong>in</strong>al. The crossl<strong>in</strong>guistic<br />

explanation is that the expectation for a verb <strong>in</strong>creases with more <strong>in</strong>terven<strong>in</strong>g<br />

material. The longer the distance between the dependent and the expected head the<br />

less likely it becomes that even more adjunctive material will <strong>in</strong>tervene before the head.<br />

Additionally, <strong>in</strong> most cases the <strong>in</strong>terven<strong>in</strong>g material narrows the possible candidates for<br />

the head, which lowers surprisal even more. The fact that the associated speed-up at the<br />

verb shows slightly different patterns <strong>in</strong> English and German encourages an expectation<strong>based</strong><br />

account for a language-specific forgett<strong>in</strong>g effect. For example it is imag<strong>in</strong>able that<br />

<strong>in</strong> head-f<strong>in</strong>al languages the prediction is more precise regard<strong>in</strong>g the exact location <strong>of</strong> the<br />

verb, whereas <strong>in</strong> other languages head-f<strong>in</strong>ality is too rare to provide exact verb location<br />

statistics <strong>in</strong> that case.<br />

<strong>Experience</strong><br />

The robustness <strong>of</strong> representations could be shaped by experience. An experience-<strong>based</strong><br />

account assumes that the reader adapts process<strong>in</strong>g strategies to <strong>of</strong>ten occurr<strong>in</strong>g structures.<br />

In result, German readers should be more skilled on head-f<strong>in</strong>al structures than<br />

English readers. An explanation <strong>based</strong> on coarse-gra<strong>in</strong>ed corpus frequencies would be<br />

equivalent to a canonicity approach. German, be<strong>in</strong>g an SOV language exhibits more<br />

head-f<strong>in</strong>al structures than English, predict<strong>in</strong>g easier process<strong>in</strong>g. But earlier discussions<br />

45


Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />

have hopefully made clear that these theories rest on weakly justified fundamental commitments.<br />

The more comprehensive approach pursued here is experience with as least<br />

as possible commitments with respect to gra<strong>in</strong> size and structural frames. In this sense<br />

a connectionist implementation <strong>in</strong> a model like MacDonald and Christiansen (2002) is<br />

promis<strong>in</strong>g to show a systematic difference <strong>in</strong> the performance on English and German<br />

double-center-embedd<strong>in</strong>g. In fact, Christiansen and Chater (1999) have shown that such<br />

a model, tra<strong>in</strong>ed on center-embedd<strong>in</strong>g and right-branch<strong>in</strong>g structures, shows better prediction<br />

performance <strong>in</strong> ungrammatical 2VP embedd<strong>in</strong>g than <strong>in</strong> the grammatical 3VP<br />

embedd<strong>in</strong>g (see chapter 3 for details). S<strong>in</strong>ce right-branch<strong>in</strong>g and center-embedd<strong>in</strong>g reflect<br />

the dependency structure <strong>in</strong> English SRCs and ORCs, respectively, Christiansen<br />

and Chater’s simulation shows that connectionist experience models can <strong>in</strong> fact account<br />

for forgett<strong>in</strong>g effects comparable to human data. But would this model show a different<br />

performance when tra<strong>in</strong>ed on a German grammar? This depends on the <strong>in</strong>volved word<br />

order regularities. In contrast to English German SRCs and ORCs both exhibit centerembedded<br />

dependencies and hence are verb-f<strong>in</strong>al. This is a considerable bias and should<br />

have an effect on the model’s performance.<br />

Anticipat<strong>in</strong>g the simulation results <strong>in</strong> chapter 4, the word order effects are present but<br />

weak. In contrast, the usage <strong>of</strong> commas seems to have a greater impact.<br />

46


Chapter 3<br />

<strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language<br />

Comprehension<br />

3.1 Structure and Learn<strong>in</strong>g<br />

<strong>Connectionist</strong> networks are prototypical exposure-<strong>based</strong> models. Or, more precisely,<br />

they are the implementation <strong>of</strong> non-committed exposure-<strong>based</strong> accounts. Non-committed<br />

should mean here accounts without any specific assumptions about structural levels or<br />

gra<strong>in</strong> sizes, nor about the l<strong>in</strong>k<strong>in</strong>g between corpus regularities and behavior. In literature<br />

there does not seem to be an agreement about what models to call connectionist. It<br />

must be mentioned that there are hybrid models that use parallel distributed activation<br />

spread<strong>in</strong>g between symbolic entities on the one hand (e.g. Just and Carpenter, 1992;<br />

Lewis and Vasishth, 2005), and there are connectionist models that use hand-designed<br />

architectures and local representations on the other hand (e.g. Dell et al., 2002; McClelland<br />

and Elman, 1984; Rohde, 2002). I am concerned here only with “fully connectionist<br />

models” us<strong>in</strong>g fully distributed representations and no pre-designed <strong>in</strong>ternal structur<strong>in</strong>g.<br />

The most important feature that dist<strong>in</strong>guishes a connectionist network model <strong>of</strong> that<br />

k<strong>in</strong>d from symbolic models is its architecturally constra<strong>in</strong>ed highly adaptive learn<strong>in</strong>g<br />

ability. <strong>Connectionist</strong> models are functional problem solv<strong>in</strong>g mach<strong>in</strong>es that, depend<strong>in</strong>g<br />

on the specific learn<strong>in</strong>g algorithm and certa<strong>in</strong> architectural properties, are able to f<strong>in</strong>d<br />

the optimal solution to any task representable as <strong>in</strong>put-output pairs. The design <strong>of</strong><br />

symbolic models mostly <strong>in</strong>volves many assumptions about the desired processes which<br />

are hard-coded <strong>in</strong>to the system. For example, it has to be specified how to categorize<br />

and represent the <strong>in</strong>put. A connectionist system on the other hand starts from zero<br />

without any presumptions. The structure <strong>of</strong> the <strong>in</strong>ternal <strong>in</strong>put representation is shaped<br />

dur<strong>in</strong>g the learn<strong>in</strong>g process depend<strong>in</strong>g on the task requirements. Obviously the <strong>in</strong>formation<br />

about the structure that l<strong>in</strong>guists annotate to word str<strong>in</strong>gs <strong>of</strong> natural language<br />

is already there <strong>in</strong> the pla<strong>in</strong> str<strong>in</strong>gs. Extract<strong>in</strong>g the underly<strong>in</strong>g structure <strong>of</strong> the <strong>in</strong>put<br />

requires <strong>in</strong>formation about sequential and temporal relations between <strong>in</strong>put chunks. For<br />

that reason time is an important component <strong>of</strong> cognitive tasks. In particular language<br />

transports highly-structured <strong>in</strong>formation while be<strong>in</strong>g entirely sequential. A memory <strong>of</strong><br />

earlier <strong>in</strong>put and the representation <strong>of</strong> temporal relations between <strong>in</strong>put chunks pro-<br />

47


OUTPUT<br />

PUT PLAN<br />

Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

vides much contextual <strong>in</strong>formation help<strong>in</strong>g to <strong>in</strong>terpret current <strong>in</strong>put. The context <strong>of</strong><br />

an utterance has a great <strong>in</strong>fluence on ambiguity resolution and predictions <strong>of</strong> content.<br />

There have been someThere accounts are many <strong>of</strong> provid<strong>in</strong>g ways <strong>in</strong> connectionist which this can networks be with temporal<br />

representation which were accomplished, <strong>of</strong> explicit and nature. a number This posed <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g limits to the number and richness<br />

<strong>of</strong> representations. proposals Elman (1990) have appeared describes<strong>in</strong> athe simple literature way to (e.g. provide a connectionist<br />

network with memory, called Jordan, a simple 1986; recurrent Tank & network Hopfield, (SRN, 1987; figure 3.1). The hidden<br />

representations <strong>in</strong> the network Stornetta, are Hogg, copied & <strong>in</strong>to Huberman, a so-called1987; context layer, which <strong>in</strong>fluences<br />

the hidden representations Watrous <strong>in</strong> the next & Shastri, step through 1987; weighted Waibel, activation feed<strong>in</strong>g. This<br />

memory loop goes without Hanazawa, any explicit H<strong>in</strong>ton, representation Shikano, & <strong>of</strong> Lang, time1987; or relations between <strong>in</strong>put<br />

chunks. It is the iterative P<strong>in</strong>eda, procedure 1988; Williams <strong>of</strong> copy<strong>in</strong>g & Zipser, and back-feed<strong>in</strong>g 1988). One itself that produces<br />

temporal relations on an<strong>of</strong> implicit the most level. promis<strong>in</strong>g Becausewas every suggested copy <strong>of</strong> the by activation pattern has<br />

been <strong>in</strong>fluenced by all earlier Jordan copies, (1986). the Jordan contextual described memory a network reaches <strong>in</strong>to the “past” <strong>in</strong> a<br />

cont<strong>in</strong>uously graded way (shown over several <strong>in</strong> Figure <strong>in</strong>put1) steps. conta<strong>in</strong><strong>in</strong>g The <strong>in</strong>formation recurrent <strong>of</strong> earlier <strong>in</strong>put representations<br />

is still <strong>in</strong> the connections representation which aswere a trace, used but to associate newer <strong>in</strong>put a has more <strong>in</strong>fluential<br />

power. Elman (1990) writes: static pattern (a “Plan”) with a serially<br />

ordered output pattern (a sequence <strong>of</strong><br />

“In this account, “Actions”). memory isThe neither recurrent passive connections nor a separate allowsubsystem.<br />

One<br />

cannot properly speak the network’s <strong>of</strong> a memory hidden forunits sequences; to see that its own memory is <strong>in</strong>extricably<br />

bound up withprevious the rest output, <strong>of</strong> the process<strong>in</strong>g so that the mechanism.” subsequent<br />

behavior can be shaped by previous<br />

This very simple wayresponses. <strong>of</strong> memory These supply recurrent yields connections architecturally are determ<strong>in</strong>ed plausible<br />

properties that can abstractly what give be described the network asmemory. storage limitations, memory span or decay<br />

<strong>of</strong> memorized representations over time. These are properties explicitly accounted for <strong>in</strong><br />

symbolic models like ACT-R or CC-READER.<br />

rchitecture used by Jordan (1986).<br />

from output to state units are one-forxed<br />

weight <strong>of</strong> 1.0. Not all connections<br />

ach can be modified <strong>in</strong><br />

<strong>in</strong>g way. Suppose a<br />

own <strong>in</strong> Figure 2) is<br />

at the <strong>in</strong>put level by<br />

nits; call these Context<br />

units are also “hidden”<br />

se that they <strong>in</strong>teract<br />

with other nodes<br />

he network, and not the<br />

ld.<br />

that there is a<br />

nput to be processed,<br />

clock which regulates<br />

<strong>of</strong> the <strong>in</strong>put to the<br />

cess<strong>in</strong>g would then<br />

e follow<strong>in</strong>g sequence <strong>of</strong><br />

time t, the <strong>in</strong>put units<br />

first <strong>in</strong>put <strong>in</strong> the sequence. Each <strong>in</strong>put might be a s<strong>in</strong>gle scalar value or a vector,<br />

n the nature <strong>of</strong> the problem. The context units are <strong>in</strong>itially set to 0.5. 2 OUTPUT UNITS<br />

HIDDEN UNITS<br />

INPUT UNITS CONTEXT UNITS<br />

Figure 2. A simple recurrent network <strong>in</strong> which activations are<br />

Figure 3.1: copied Architecture from hidden <strong>of</strong> layer a simple to context recurrent layer network on a one-for-one (SRN, Elman, 1990). The<br />

solid l<strong>in</strong>ebasis, represents with fixed fixed weight one-to-one <strong>of</strong> 1.0. Dotted connections l<strong>in</strong>es represent to thetra<strong>in</strong>able context layer. Dashed l<strong>in</strong>es<br />

connections.<br />

represent tra<strong>in</strong>able connections.<br />

Both the <strong>in</strong>put<br />

ntext units activate the hidden units; and then the hidden units feed forward to<br />

48<br />

tion function used here bounds values between 0.0 and 1.0.<br />

Page 4


3.2 Recursion and Complexity<br />

The architecturally emergent nature <strong>of</strong> task solution is also <strong>of</strong>ten the subject <strong>of</strong> criticism<br />

concern<strong>in</strong>g connectionist models. The argument goes that the explanatory value <strong>of</strong><br />

connectionist models is low s<strong>in</strong>ce there is no way to know what the network has really<br />

learned. Hence, these models do not provide any <strong>in</strong>sight <strong>in</strong>to pars<strong>in</strong>g mechanisms or<br />

memory representations. Address<strong>in</strong>g that issue Elman (1990) proved that an SRN <strong>in</strong> a<br />

way learns the lexical classes as we use them. For that Elman tra<strong>in</strong>ed the network on a<br />

simple word prediction task that <strong>in</strong>volved simplified natural sentences as Noun-Verb or<br />

Noun-Verb-Noun sequences. The comparison <strong>of</strong> hidden unit activation vectors yielded<br />

a hierarchical cluster<strong>in</strong>g <strong>of</strong> words <strong>in</strong> natural classes like verbs and nouns which were<br />

sub-clustered by transitiveness or animateness, respectively. The categorial cluster<strong>in</strong>g<br />

<strong>of</strong> representations depends directly on the “behavioral” relations <strong>of</strong> the words. Words<br />

<strong>of</strong>ten occurr<strong>in</strong>g <strong>in</strong> similar environments result <strong>in</strong> similar representations <strong>in</strong> the hidden<br />

layer activation. When an <strong>in</strong>put word is replaced after tra<strong>in</strong><strong>in</strong>g by a new word, the<br />

network has never seen, it will <strong>in</strong>ternally represent that word <strong>in</strong> the same way as the<br />

replaced word s<strong>in</strong>ce it behaves <strong>in</strong> the same way. An advantage <strong>of</strong> distributed representations<br />

with a cont<strong>in</strong>uous activation range is that they theoretically provide <strong>in</strong>f<strong>in</strong>itive<br />

memory capacity. That, however, does not mean connectionist systems are able to store<br />

<strong>in</strong>f<strong>in</strong>ite equally treated representations like it would be the case <strong>in</strong> theoretical symbolic<br />

models with <strong>in</strong>f<strong>in</strong>ite capacity. Representations <strong>of</strong> higher gra<strong>in</strong> size, i.e. <strong>in</strong>volv<strong>in</strong>g<br />

a higher activation range, have more <strong>in</strong>fluential power with respect to the networks behavior<br />

than more f<strong>in</strong>e-gra<strong>in</strong>ed activation patterns do. So, what grades memory (and<br />

hence shapes memory capacity) is the relative importance <strong>of</strong> the stored representations.<br />

This relates to the type-token issue. As expla<strong>in</strong>ed above, connectionist networks are able<br />

to discern types like lexical categories, but they do not ascribe any mean<strong>in</strong>g to s<strong>in</strong>gle<br />

tokens. However, that does not mean that connectionist networks do not functionally<br />

dist<strong>in</strong>guish between different tokens. As Elman also showed, tokens <strong>of</strong> the same type are<br />

represented very similarly but with subtle differences. Notable, these differences do not<br />

only exist between tokens but also <strong>in</strong> tokens, discern<strong>in</strong>g usages <strong>of</strong> that token <strong>in</strong> slightly<br />

differ<strong>in</strong>g contexts. In this way a token is a fuzzy category on a cont<strong>in</strong>uous scale. This<br />

makes philosophically sense s<strong>in</strong>ce the boundary between type and token is <strong>of</strong>ten a matter<br />

<strong>of</strong> po<strong>in</strong>t <strong>of</strong> view, which is why also tokens can <strong>of</strong>ten be analyzed as hav<strong>in</strong>g an <strong>in</strong>ternal<br />

structure.<br />

3.2 Recursion and Complexity<br />

Recursion is considered to be a language-<strong>in</strong>dependet structural phenomenon which causes<br />

process<strong>in</strong>g difficulty. Besides the limit to recursion depth <strong>in</strong> the human language processor<br />

there are two rather surpris<strong>in</strong>g properties <strong>of</strong> human process<strong>in</strong>g ability on recursive<br />

structures. First, <strong>in</strong> a comprehensibility rat<strong>in</strong>g study Bach et al. (1986) compared the<br />

difficulty <strong>of</strong> Dutch cross-dependency versus German center-embedd<strong>in</strong>g relative to rightbranch<strong>in</strong>g<br />

and found cross-dependency to be easier on deeper embedd<strong>in</strong>g than center-<br />

49


Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

embedd<strong>in</strong>g. Second, a number <strong>of</strong> studies (Bach et al., 1986; Blaubergs and Bra<strong>in</strong>e,<br />

1974; Christiansen and MacDonald, 1999) showed <strong>in</strong>creas<strong>in</strong>g difficulty for iterated rightbranch<strong>in</strong>g.<br />

These performance patterns are not predicted by grammatical complexity.<br />

Cross-dependency is seen as especially complex because it is not representable by<br />

context-free phrase structure rules. Repeated right-branch<strong>in</strong>g, on the other hand, is not<br />

predicted to affect comprehension <strong>in</strong> most symbolic models.<br />

Christiansen and Chater (1999) showed <strong>in</strong> a series <strong>of</strong> simulations that simple recurrent<br />

network’s word-by-word prediction performance on recursive structures exhibits<br />

exactly the three properties just mentioned for the human comprehender. Chistiansen<br />

and Chater assessed the performance <strong>of</strong> SRNs on different types <strong>of</strong> grammatical complexity.<br />

In particular, they used the three types <strong>of</strong> recursion proposed by Chomsky<br />

(1957): count<strong>in</strong>g recursion, mirror recursion (center-embedd<strong>in</strong>g) and identity recursion<br />

(cross-dependency). As a basel<strong>in</strong>e they also <strong>in</strong>cluded iterative right-branch<strong>in</strong>g structures.<br />

Christiansen and Chater constructed three artificial languages that each covered<br />

one <strong>of</strong> the three recursion types and right-branch<strong>in</strong>g. They were used to create tra<strong>in</strong><strong>in</strong>g<br />

corpora conta<strong>in</strong><strong>in</strong>g half right-branch<strong>in</strong>g, half recursive structures. The SRNs tra<strong>in</strong>ed<br />

on these corpora performed best on the count<strong>in</strong>g recursion dur<strong>in</strong>g test<strong>in</strong>g, followed by<br />

cross-dependency and hav<strong>in</strong>g most trouble with center-embedd<strong>in</strong>g. Similar to the human<br />

results the cross < center preference occurred only <strong>in</strong> embedd<strong>in</strong>g levels <strong>of</strong> two<br />

and upward. In s<strong>in</strong>gle embedd<strong>in</strong>g the two recursion types were equally well handled,<br />

resembl<strong>in</strong>g the human results by Bach et al. Also consistent with human data the SRNs<br />

showed decl<strong>in</strong><strong>in</strong>g performance with deeper embedd<strong>in</strong>g on right-branch<strong>in</strong>g structures.<br />

The consistency <strong>of</strong> the SRNs predictions on recursive structures even <strong>in</strong>cluded the<br />

phenomenon <strong>of</strong> the grammaticality illusion <strong>in</strong> double center-embedd<strong>in</strong>g. The SRN simulation<br />

was able to replicate the forgett<strong>in</strong>g effect described <strong>in</strong> 2.5. After tra<strong>in</strong><strong>in</strong>g on<br />

the center-embedd<strong>in</strong>g language the networks were tested on grammatical 3VP and ungrammatical<br />

2VP constructions. Mean error scores showed that the network favored the<br />

ungrammatical sequence <strong>of</strong> NNNVV over the grammatical NNNVVV sequence. Figure<br />

3.2 shows the SRN’s output node activations after see<strong>in</strong>g the second verb. The high activation<br />

<strong>of</strong> the end-<strong>of</strong>-sentence marker (EOS) clearly <strong>in</strong>dicates that the network expects<br />

the sentence here to be complete.<br />

Interest<strong>in</strong>gly, <strong>in</strong> all the simulations the number <strong>of</strong> hidden units did not affect the<br />

SRNs’ performance on recursion given a number higher than 15. Thus, <strong>in</strong> contrast to<br />

a criticism which is <strong>of</strong>ten brought forward the hidden layer size is not comparable to<br />

a certa<strong>in</strong> capacity <strong>in</strong> symbolic models, which can theoretically be <strong>in</strong>creased to capture<br />

recursion depth <strong>in</strong>to <strong>in</strong>f<strong>in</strong>ity. But what is memory capacity <strong>in</strong> a connectionist network<br />

then? MacDonald and Christiansen (2002) say the follow<strong>in</strong>g:<br />

“To the extent that it is useful to talk about work<strong>in</strong>g memory with<strong>in</strong> these<br />

systems, it is the network itself; it is not some separate entity that can vary<br />

<strong>in</strong>dependently <strong>of</strong> the architecture and experience that governs the network’s<br />

process<strong>in</strong>g efficiency.” (p. 38)<br />

50


3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

Mean Activation<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

S<strong>in</strong>g<br />

Nouns<br />

Indicates 2VP Preference<br />

Indicates 3VP Preference<br />

Erroneous Activation<br />

Plur<br />

Nouns<br />

S<strong>in</strong>g<br />

Verbs<br />

Plur<br />

Verbs<br />

Lexical Categories<br />

Figure 3.2: Ungrammaticality simulation by Christiansen and Chater (1999) (their<br />

figure 10). Mean output activation <strong>of</strong> lexical categories with error bars <strong>in</strong>dicat<strong>in</strong>g<br />

standard error.<br />

In their view capacity is only a higher-level description <strong>of</strong> an SRN’s behavior. Chang<strong>in</strong>g<br />

aspects <strong>of</strong> the architecture or the tra<strong>in</strong><strong>in</strong>g always affects both memory and process<strong>in</strong>g.<br />

In contrast Just and Varma (2002) claim that noisy <strong>in</strong>put to a network would be comparable<br />

to chang<strong>in</strong>g the capacity limit <strong>in</strong> symbolic models like CC-READER. They say<br />

that <strong>in</strong> the network the representational quality would be affected while the grammat-<br />

ical knowledge would stay constant. Whatever view may be correct, one can probably<br />

say that most effect on memory capacity has the mechanism <strong>of</strong> the temporal loop and<br />

the backpropagation procedure. Us<strong>in</strong>g different learn<strong>in</strong>g algorithms like for example<br />

backpropagation through time can <strong>in</strong>crease the network’s memory span.<br />

3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

3.3.1 MacDonald and Christiansen (2002)<br />

MacDonald and Christiansen (2002) (MC02) presented a connectionist model cover<strong>in</strong>g<br />

<strong>in</strong>dividual and global differences <strong>in</strong> relative clause comprehension <strong>in</strong> English. Mapp<strong>in</strong>g<br />

the word-by-word prediction performance <strong>of</strong> an Elman network on read<strong>in</strong>g times they<br />

showed an impressively accurate fit <strong>of</strong> the results <strong>of</strong> K<strong>in</strong>g and Just (1991). Show<strong>in</strong>g this,<br />

MC02 directly attacked the importance <strong>of</strong> a discrete memory component for the subject/object<br />

difference and <strong>in</strong>dividual differences <strong>in</strong> comprehension <strong>of</strong> relative clauses. S<strong>in</strong>ce<br />

MC02’s model serves as the basis for the simulations <strong>in</strong> chapter 4, I will here describe<br />

61<br />

51<br />

EOS


Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

their experiment <strong>in</strong> detail. MC02 used a standard simple recurrent network (SRN) with<br />

a hidden and context layer <strong>of</strong> 60 units each. In- and output layers <strong>of</strong> 31 units each represented<br />

30 words plus an end-<strong>of</strong>-sentence (EOS) symbol. The corpora each consisted<br />

<strong>of</strong> 10,000 English sentences constructed randomly from a simple artificial probabilistic<br />

context-free grammar (PCFG). Subject- or object-modify<strong>in</strong>g relative clauses were conta<strong>in</strong>ed<br />

<strong>in</strong> 5% <strong>of</strong> the sentences. Half were subject extracted and half were object extracted<br />

RCs. The rest <strong>of</strong> each corpus consisted <strong>of</strong> simple mono-clausal sentences. Verbs differed<br />

by transitivity and shared a number agreement with their subject nouns. Each corpus<br />

consisted <strong>of</strong> about 55,000 words. The sentence length was 3 to 27 words with a mean <strong>of</strong><br />

4.5. Notably relative clauses could be embedded recursively <strong>in</strong> each noun phrase. The<br />

RC attachment probability <strong>in</strong> the PCFG (0.05) limited the embedd<strong>in</strong>g depth. MC02<br />

tra<strong>in</strong>ed 10 networks with randomly distributed <strong>in</strong>itial weights 1 , each on a different corpus.<br />

The learn<strong>in</strong>g rate was set to 0.1. The tra<strong>in</strong><strong>in</strong>g phase covered only three epochs,<br />

each consist<strong>in</strong>g <strong>of</strong> one corpus length. The networks learned to predict the next word<br />

<strong>in</strong> a sentence without be<strong>in</strong>g provided with any probabilistic <strong>in</strong>formation. The output<br />

unit activations were calculated by a cross-entropy algorithm which ensured that all<br />

activation values summed to one. In that way the networks’ output was comparable to<br />

cont<strong>in</strong>uation likelihoods assigned to each possible word. After tra<strong>in</strong><strong>in</strong>g the networks were<br />

assessed on 10 sentences <strong>of</strong> all three types (SRC, ORC, and simple clause), respectively.<br />

For <strong>in</strong>terpret<strong>in</strong>g the network output <strong>in</strong> terms <strong>of</strong> process<strong>in</strong>g difficulty MC02 calculated<br />

the so-called grammatical prediction error 2 (GPE). The GPE value is a measure<br />

for the network’s difficulty <strong>in</strong> mak<strong>in</strong>g the correct predictions on each word. The measure<br />

was then used to map the relative word-by-word differences between the conditions on<br />

read<strong>in</strong>g times from the study by K<strong>in</strong>g and Just (1991). Besides RC type MC02 used<br />

tra<strong>in</strong><strong>in</strong>g epochs as a second factor. The network performances after one, two, and three<br />

epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g were compared to low-, mid-, and high-span readers’ read<strong>in</strong>g speed.<br />

The results <strong>of</strong> MC02’s network simulation are shown <strong>in</strong> figure 3.3. Pooled over all<br />

three epochs the results show a clear subject preference on the ma<strong>in</strong> verb (praised) and<br />

the preced<strong>in</strong>g region (embedded object <strong>in</strong> the SRC and embedded verb <strong>in</strong> the ORC).<br />

Furthermore the ORC performance shows significant improvement on the embedded and<br />

ma<strong>in</strong> verb through the three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g. Notably, the SRC data does not show<br />

such an improvement. Rather the performance was relatively good from the start with<br />

no change dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. This <strong>in</strong>dicates a clause type × exposure <strong>in</strong>teraction. The<br />

same <strong>in</strong>teraction (<strong>in</strong> this case clause type × read<strong>in</strong>g span) is seen <strong>in</strong> K<strong>in</strong>g and Just’s<br />

empirical data (figure 2.1). Notably, the simple SRN model seems to make better predictions<br />

than the CC-READER model by Just and Carpenter (1992) s<strong>in</strong>ce CC-READER<br />

captures the span effect but not the <strong>in</strong>teraction with clause type (see figure 2.4). Importantly,<br />

MC02 call the mentioned <strong>in</strong>teraction a F requency × Regularity <strong>in</strong>teraction.<br />

Specifically, the regular nature <strong>of</strong> English SRCs with respect to word order (SVO) serves<br />

1 Between -0.15 and 0.15.<br />

2 See chapter 4 for a detailed description<br />

52


42<br />

3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

MACDONALD AND CHRISTIANSEN<br />

Figure 2. Network performance on sentence <strong>in</strong>volv<strong>in</strong>g s<strong>in</strong>gly embedded subject- and object-relative clauses.<br />

Grammatical prediction error scores were averaged over 10 novel sentences <strong>of</strong> each k<strong>in</strong>d and grouped <strong>in</strong>to four<br />

regions to facilitate comparisons with the human data (see Just & Carpenter, 1992, Figure 9, p. 140). Error bars<br />

represent standard error.<br />

Figure 3.3: F requency × Regularity <strong>in</strong>teraction <strong>of</strong> SRCs and ORCs. Performance <strong>of</strong><br />

the model <strong>in</strong> MacDonald and Christiansen (2002). Error bars show standard error.<br />

sponds to that <strong>of</strong> high-span participants, the pattern <strong>of</strong> performance<br />

resembles that found <strong>in</strong> the human data discussed by Just and<br />

Carpenter, <strong>in</strong>clud<strong>in</strong>g at the ma<strong>in</strong> verb region where K<strong>in</strong>g and Just<br />

reported their three critical effects. 3 Carpenter’s and Waters and Caplan’s is that we have two claims<br />

about what capacity is and is not. First, capacity is not some<br />

as a sort <strong>of</strong> familiarity boost that makes primitive, them<strong>in</strong>dependent easier toproperty handle. <strong>of</strong> networks As described or humans<strong>in</strong><strong>in</strong> our<br />

section 3.1 the representational First, averag<strong>in</strong>gsimilarity across sen- (and account with but it is <strong>in</strong>stead the related strictly emergent prediction from behavior)<br />

other architectural<br />

tence type, <strong>of</strong>the words networks is bound with moreto tra<strong>in</strong><strong>in</strong>g theirhad occurrence lower error rates behavior and experiential <strong>in</strong> context. factors. S<strong>in</strong>ce Second, thecapacity representation is not <strong>in</strong>dependent <strong>of</strong> <strong>of</strong><br />

than less-tra<strong>in</strong>ed networks at the critical ma<strong>in</strong> verb region, analo- knowledge, so that one cannot manipulate factors underly<strong>in</strong>g the<br />

each <strong>in</strong>put word also <strong>in</strong>cludes traces <strong>of</strong> each preced<strong>in</strong>g word (i.e. the currently preced<strong>in</strong>g<br />

gous to the ma<strong>in</strong> effect <strong>of</strong> read<strong>in</strong>g span that K<strong>in</strong>g and Just (1991) capacity <strong>of</strong> a network (e.g., hidden unit layer size, activation<br />

reported. structure) Second, averag<strong>in</strong>g similar across structures tra<strong>in</strong><strong>in</strong>g epoch, result the SRNs <strong>in</strong> similar had function, <strong>in</strong>ternal weight representations decay, connectivitywhich pattern, aga<strong>in</strong> tra<strong>in</strong><strong>in</strong>g) result without also<br />

higher error <strong>in</strong> similar rates withprediction object than subject behavior. relativesConsequently at the ma<strong>in</strong> affect<strong>in</strong>g the relatively the knowledge quickly embedded ga<strong>in</strong>ed <strong>in</strong> that“knowledge” network. These two<br />

verb, yield<strong>in</strong>g a ma<strong>in</strong> effect <strong>of</strong> relative clause type. Third, there claims are not merely term<strong>in</strong>ological changes but rather make our<br />

about simple sentences (95% <strong>of</strong> the corpus) also <strong>in</strong>fluences the “knowledge” about the<br />

was little effect <strong>of</strong> tra<strong>in</strong><strong>in</strong>g for the “regular” subject relatives and account qualitatively different from the work<strong>in</strong>g memory accounts<br />

a large effect structurally <strong>of</strong> tra<strong>in</strong><strong>in</strong>g for similar the “irregular” SRCs. object relatives at the advocated by Just and Carpenter and Waters and Caplan, for which<br />

ma<strong>in</strong> verb. This result captures the Span Clause Type <strong>in</strong>teraction one or more capacities can vary <strong>in</strong>dependently <strong>of</strong> knowledge,<br />

discussed by Just and Carpenter and stands <strong>in</strong> contrast to their own experience, and other factors. In their view, capacity is someth<strong>in</strong>g<br />

simulation results. Just and Carpenter’s CC-READER simulations, that enables a certa<strong>in</strong> level <strong>of</strong> process<strong>in</strong>g ability or skill, whereas<br />

which are shown <strong>in</strong> the top panels <strong>of</strong> their Figure 9, appear to have for us, capacity is a synonym for that skill. Just and Carpenter’s<br />

yielded ma<strong>in</strong> effects <strong>of</strong> span and sentence type but not the crucial and Waters and Caplan’s <strong>in</strong>termediate step is superfluous <strong>in</strong> our<br />

<strong>in</strong>teraction <strong>in</strong> that the effect <strong>of</strong> span appears no larger on object account.<br />

than subject relatives at the ma<strong>in</strong> verb or at any other region.<br />

Our results highlight the potentially complex role <strong>of</strong> experience<br />

Multiple Versus S<strong>in</strong>gle Interpretations<br />

<strong>in</strong> <strong>in</strong>dividual differences. Both types <strong>of</strong> relative clauses were<br />

<strong>of</strong> Syntactic Ambiguities<br />

encountered equally frequently <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g corpora, but the<br />

superior performance on the subject relatives stems from the Just and Carpenter (1992, pp. 130–132; see also the Waters &<br />

networks’ abilities to generalize to rare structures as a function <strong>of</strong> Caplan, 1996, reply on pp. 765–766) presented evidence for a<br />

experience with similar, more common simple sentences. The work<strong>in</strong>g memory capacity <strong>based</strong> on MacDonald, Just, & Carpenextent<br />

and nature <strong>of</strong> this Frequency Regularity <strong>in</strong>teraction<br />

changed as a function <strong>of</strong> the overall experience <strong>of</strong> the network <strong>in</strong><br />

3<br />

that additional experience helped performance with object rela- The one region where the SRN does not correspond well to K<strong>in</strong>g and<br />

tives more than with subject relatives.<br />

Just’s human data is at the last word <strong>of</strong> the subord<strong>in</strong>ate clause <strong>in</strong> the<br />

At this po<strong>in</strong>t it is important to consider an alternative view, that<br />

subject-relative sentences, where the SRN showed less process<strong>in</strong>g difficulty<br />

than K<strong>in</strong>g and Just’s participants. This discrepancy may be due to<br />

our model is simply a connectionist implementation <strong>of</strong> a capacity-<br />

variations <strong>in</strong> the materials, particularly to the length <strong>of</strong> the subject-relative<br />

<strong>based</strong> account <strong>of</strong> <strong>in</strong>dividual differences. We are not claim<strong>in</strong>g, <strong>of</strong><br />

clause, which varied <strong>in</strong> K<strong>in</strong>g and Just’s items but was more uniform <strong>in</strong> our<br />

course, that there is no such th<strong>in</strong>g as capacity; clearly any network materials. Gibson and Ko (1998) also used uniformly short subject relatives<br />

(and any human) can be described as hav<strong>in</strong>g a particular capacity and found little process<strong>in</strong>g difficulty at this position <strong>in</strong> self-paced read<strong>in</strong>g<br />

to process <strong>in</strong>formation, and <strong>in</strong>dividual networks and people can studies, and our simulations correspond quite well to their read<strong>in</strong>g data <strong>in</strong><br />

vary <strong>in</strong> their capacities. What sets our account apart from Just and this and other regions.<br />

3 Theoretically one could <strong>in</strong>vent a structure which is similar<br />

to SRCs but which the network has not been exposed to <strong>in</strong> tra<strong>in</strong><strong>in</strong>g. The SRN would<br />

be able to handle it, nonetheless, benefit<strong>in</strong>g from previous experience. The reason for<br />

the absence <strong>of</strong> the regularity benefit <strong>in</strong> ORC is its deviation from all other structures<br />

<strong>in</strong> the corpus. While <strong>in</strong> the SRC reporter, attacked, and senator appear <strong>in</strong> the regular<br />

NVN order<strong>in</strong>g, the ORC exhibits an NNV order<strong>in</strong>g. Specifically the number agreement<br />

serves as a structural h<strong>in</strong>t <strong>in</strong> differentiat<strong>in</strong>g the word orders and to avoid that wrong<br />

regularities are learned. The network ga<strong>in</strong>s implicit <strong>in</strong>formation about subject-object<br />

relations from the <strong>in</strong>put data through the structural fact that verbs share a number<br />

agreement with subjects but not with objects.<br />

The simulations show that a purely exposure-<strong>based</strong> model is able to predict complex<br />

<strong>in</strong>teractions <strong>of</strong> complexity-related read<strong>in</strong>g difficulties and <strong>in</strong>dividual differences on<br />

word level. The evidence lead MC02 to the formulation <strong>of</strong> their Skill-through-<br />

3I put knowledge <strong>in</strong> quotes here because what is meant <strong>in</strong> a connectionist system is not declarative or<br />

explicit knowledge but rather implicit behavioral knowledge provided by the connection weights.<br />

53


Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

<strong>Experience</strong> Account (p. 44) that attacks the modular picture <strong>of</strong> knowledge and<br />

memory. The crucial claim <strong>of</strong> MC02 is that differences <strong>in</strong> performance result from process<strong>in</strong>g<br />

skill as a function <strong>of</strong> experience and not a separable WM capacity.<br />

“In our view, neither knowledge nor capacity are primitives that can vary<br />

<strong>in</strong>dependently <strong>in</strong> theory or computational models; rather they emerge from<br />

the <strong>in</strong>teraction <strong>of</strong> network architecture and experience.” (p. 37)<br />

The subsymbolic (and behavioristic) nature <strong>of</strong> connectionist networks make grammatical<br />

knowledge and process<strong>in</strong>g <strong>in</strong>dist<strong>in</strong>guishable. A change <strong>in</strong> parameters like weight<br />

vectors or hidden layer size is not attributable to one <strong>of</strong> the two components. Rather<br />

either affects the behavior <strong>of</strong> the whole network.<br />

3.3.2 Critique and Relation to other Approaches<br />

MC02 see their model as an opposition ma<strong>in</strong>ly to models like Just and Carpenter (1992)<br />

and Waters and Caplan (1996), which explicitly account for memory capacity limitations.<br />

MC02’s SRN simulations have important implications with respect to biological plausibility<br />

<strong>of</strong> process<strong>in</strong>g models. they demonstrated that there is no need to assume separable<br />

work<strong>in</strong>g memory and knowledge modules <strong>in</strong> order to account for effects attributed to<br />

these. Rather experience shapes the whole system and capacity is a property emergent<br />

from the systems architecture. That emphasizes the role <strong>of</strong> symbolic models like Just<br />

and Carpenter (1992) as merely higher-level descriptions <strong>of</strong> underly<strong>in</strong>g processes. There<br />

is, <strong>of</strong> course, noth<strong>in</strong>g wrong with symbolic descriptions. What is <strong>in</strong> question, however,<br />

is the justification <strong>of</strong> explicit numerical limits on capacity. In Just and Carpenters account<br />

the capacity limit is def<strong>in</strong>ed as the maximal amount <strong>of</strong> activation attributed to<br />

productions (process<strong>in</strong>g rules). The latter can be varied without touch<strong>in</strong>g the rest <strong>of</strong> the<br />

system. Argu<strong>in</strong>g with MC02, however, this value is <strong>in</strong>deed emergent and <strong>in</strong>separable<br />

from the entire system. As a consequence sentence comprehension and read<strong>in</strong>g span<br />

measure the same th<strong>in</strong>g, namely read<strong>in</strong>g skill, which is the experience-shaped efficiency<br />

<strong>of</strong> l<strong>in</strong>guistic processes.<br />

Non-L<strong>in</strong>guistic Work<strong>in</strong>g Memory<br />

Not conv<strong>in</strong>ced by this view Roberts and Gibson (2002) note that a pure skill-viaexperience<br />

account would not be able to expla<strong>in</strong> these correlations <strong>of</strong> comprehension skill<br />

with non-l<strong>in</strong>guistic work<strong>in</strong>g memory tasks that do not <strong>in</strong>volve sentence read<strong>in</strong>g. Roberts<br />

and Gibson provide respective empirical evidence for correlations <strong>of</strong> sentence memory<br />

with several memory load tasks that do not <strong>in</strong>volve read<strong>in</strong>g sentences. Address<strong>in</strong>g these<br />

correlations MC02 propose that read<strong>in</strong>g skill is tied to phonological representations.<br />

These representations play the crucial role <strong>in</strong> all sorts <strong>of</strong> memory load tasks and account<br />

for <strong>in</strong>dividual differences. Regard<strong>in</strong>g phonological representations MC02 formulate four<br />

important claims (p.45):<br />

54


3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

(a) “Phonological and articulatory representations must be activated <strong>in</strong> order to utter<br />

the words for the load task”.<br />

(b) “Phonological activation is an important component <strong>of</strong> written and spoken sentence<br />

comprehension, particularly for certa<strong>in</strong> difficult sentence structures”.<br />

(c) “The extend to which phonological representations are important dur<strong>in</strong>g comprehension<br />

<strong>of</strong> difficult syntactic structures is likely to vary <strong>in</strong>versely with experience,<br />

such that phonological <strong>in</strong>formation is more crucial for less experienced comprehenders”.<br />

(d) “There appear to be notable <strong>in</strong>dividual differences <strong>in</strong> the ‘precision’ <strong>of</strong> phonological<br />

representations computed dur<strong>in</strong>g language comprehension, and these differences<br />

are thought to owe both to read<strong>in</strong>g experience and to biological factors.”<br />

As becomes clear MC02 do not completely deny an <strong>in</strong>fluence <strong>of</strong> biological factors on<br />

process<strong>in</strong>g skill. These factors, however, concern the precision <strong>of</strong> representations, not<br />

capacity limitations, and those are subject to experience-caused variance. Moreover are<br />

<strong>in</strong>dividual differences assumed to be allocated primarily <strong>in</strong> the dependence on these<br />

phonological representations, mean<strong>in</strong>g that highly-skilled readers exhibit a more efficient<br />

process<strong>in</strong>g that does not rely so much on the phonological <strong>in</strong>formation. For example <strong>in</strong><br />

extr<strong>in</strong>sic load tasks 4 both the stored items and sentence comprehension processes make<br />

use <strong>of</strong> shared phonological representations. Thus MC02 expla<strong>in</strong> load effects by activation<br />

<strong>in</strong>terference rather than activation limits. This is seen as naturally evolv<strong>in</strong>g from<br />

evidence that articulatory plann<strong>in</strong>g <strong>in</strong>volves strict activation and <strong>in</strong>hibition <strong>of</strong> phonological<br />

units (Bock, 1987; Dell and O’Seaghdha, 1992). Thus dur<strong>in</strong>g extr<strong>in</strong>sic load tasks<br />

activation and <strong>in</strong>hibition processes from both load and comprehension mechanism work<br />

on the same representations, <strong>in</strong>terfer<strong>in</strong>g with each other. The more effective process<strong>in</strong>g<br />

<strong>of</strong> highly experienced readers makes less use <strong>of</strong> the representation and, thus, reduces<br />

difficulties due to <strong>in</strong>terference. The same processes also happen <strong>in</strong> the read<strong>in</strong>g span task<br />

(which, by the way, is basically the same task as extr<strong>in</strong>sic load). The conclusion is that<br />

read<strong>in</strong>g span is a function <strong>of</strong> experience and not <strong>of</strong> memory capacity. This account is<br />

also superior to Waters and Caplan (1996) because their theory assumes two separate<br />

work<strong>in</strong>g memories and, hence, does not predict an <strong>in</strong>teraction <strong>of</strong> comprehension and<br />

extr<strong>in</strong>sic load. Furthermore also RC type differences are expla<strong>in</strong>ed <strong>in</strong> the same way,<br />

namely that “object relatives, which are more challeng<strong>in</strong>g than subject relatives, are<br />

likely to rely more on phonological <strong>in</strong>formation than subject relatives” (p. 45).<br />

4 In these tasks participants are asked to memorize a set <strong>of</strong> words or digits and reta<strong>in</strong> it while read<strong>in</strong>g<br />

sentences. The extr<strong>in</strong>sic load <strong>in</strong>fluences the sentence comprehension performance <strong>in</strong> a certa<strong>in</strong> way<br />

that correlates with the participants read<strong>in</strong>g span value (Just and Carpenter, 1992; K<strong>in</strong>g and Just,<br />

1991).<br />

55


German Word Order<br />

Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

Konieczny and Ruh (2003) ran simulations on German relative clauses us<strong>in</strong>g the model<br />

parameters <strong>of</strong> MC02. The results are <strong>in</strong>consistent with the empirical subject preference.<br />

German ORCs clearly exhibit lower error rates on the embedded verb. In addition the<br />

results do not show a frequency × regularity <strong>in</strong>teraction. This is not so surpris<strong>in</strong>g<br />

consider<strong>in</strong>g german word order properties. In English the regularity effect is attributed<br />

to the shared SVO order<strong>in</strong>g <strong>of</strong> ma<strong>in</strong> clauses and SRCs that separate both structures from<br />

the SOV ordered ORCs. On the other hand, <strong>in</strong> German while ma<strong>in</strong> clauses commonly<br />

exhibit an SVO order<strong>in</strong>g as <strong>in</strong> English the order <strong>in</strong> SRCs and ORCs is SOV and OSV<br />

respectively. In addition the free word order <strong>in</strong> German also allows an OVS structure<br />

<strong>in</strong> ma<strong>in</strong> clauses. This makes four different possible word orders <strong>in</strong> German that are not<br />

expected to give rise to a regularity preferr<strong>in</strong>g one <strong>of</strong> the two RC types. As po<strong>in</strong>ted out<br />

<strong>in</strong> 1.3.3 a canonicity account <strong>based</strong> on thematic order<strong>in</strong>gs cannot make clear predictions<br />

assum<strong>in</strong>g SVO as the canonic order<strong>in</strong>g. On the other hand, an SOV-canonicity account<br />

would make the correct predictions. However, to derive structural frequency-<strong>based</strong> SOV<br />

regularity the structural scope <strong>of</strong> the model would have to be extended. Simple ma<strong>in</strong><br />

clauses do not provide the desired regular structures as they do <strong>in</strong> English. I suspect<br />

that <strong>in</strong>clud<strong>in</strong>g sentential complements and other subord<strong>in</strong>ate clauses exhibit<strong>in</strong>g an SOV<br />

pattern would result <strong>in</strong> the desired frequency × regularity <strong>in</strong>teraction with an advantage<br />

for SRCs. The exact reason for the actual advantage <strong>of</strong> ORCs over SRCs <strong>in</strong> the<br />

current model will be discusses at the end <strong>of</strong> the next section, which is concerned <strong>in</strong><br />

more detail with the SRN’s structure-<strong>based</strong> learn<strong>in</strong>g <strong>in</strong> RC process<strong>in</strong>g.<br />

3.3.3 What is learned?<br />

As laid out <strong>in</strong> the previous section the model by MacDonald and Christiansen (2002)<br />

was criticized <strong>in</strong> many issues. It is not completely evident how much <strong>of</strong> the networks<br />

prediction can be attributed to a frequency × regularity effect and what are merely<br />

artifacts. A mere correlation between network experience and human read<strong>in</strong>g span is<br />

no sufficient evidence for an experience effect for human readers. Similarly, the effect <strong>of</strong><br />

structural regularity differences between SRCs and ORCs does not necessarily cause the<br />

preference pattern for human readers. Also, assum<strong>in</strong>g the conclusions drawn from the<br />

mentioned correlations are correct, the question rema<strong>in</strong>s <strong>of</strong> what exactly the learn<strong>in</strong>g<br />

effect <strong>in</strong> relative clauses is <strong>based</strong> upon. An exposure-<strong>based</strong> theory driven by structural<br />

frequency is <strong>in</strong> the need to say someth<strong>in</strong>g about the specific structural cues essential for<br />

shap<strong>in</strong>g the efficiency <strong>of</strong> RC process<strong>in</strong>g.<br />

Assess<strong>in</strong>g <strong>Experience</strong><br />

Wells et al. (2009) designed an experiment that took the challenge <strong>of</strong> replicat<strong>in</strong>g the<br />

effects <strong>of</strong> experience with human readers for hav<strong>in</strong>g a basis to assess the accuracy <strong>of</strong><br />

56


3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

MC02’s SRN predictions. For that reason two participant groups were formed: an RC<br />

experience group and a control experience group, both matched on read<strong>in</strong>g span. Both<br />

groups received read<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g over 3-4 weeks. The RC experience group received<br />

tra<strong>in</strong><strong>in</strong>g ma<strong>in</strong>ly on SRC and ORC structures whereas the control experience group was<br />

exposed to other structures. A pre- and a post-test carried out <strong>in</strong> the SPR paradigm<br />

assessed both groups RC read<strong>in</strong>g performance. The results show an overall improvement<br />

<strong>of</strong> RC read<strong>in</strong>g. Most importantly, however, the data revealed an <strong>in</strong>teraction <strong>of</strong><br />

session × experience group × clause type × region as can be seen <strong>in</strong> figure 3.4.<br />

There was a reliable difference between SRC and ORC read<strong>in</strong>g times observed <strong>in</strong> both<br />

groups <strong>in</strong> the pre-test; <strong>in</strong> particular the ORC condition was read slower. The subject/object<br />

difference, however, decreased significantly between pre- and post-test for the<br />

RC experience group whereas <strong>in</strong> the control group it stayed the same. Wells et al. attribute<br />

the global read<strong>in</strong>g improvement to <strong>in</strong>creased familiarity with the SPR task. The<br />

observed pattern <strong>in</strong> the experience group is similar, first, to the span × clause type<br />

<strong>in</strong>teraction <strong>in</strong> the study by K<strong>in</strong>g and Just (1991) reported <strong>in</strong> 2.1, and second, it resembles<br />

the frequency × regularity <strong>in</strong>teraction <strong>of</strong> the connectionist model <strong>in</strong> MC02 (see<br />

figure 3.5). Statistically the SRNs’ mean GPE scores predicted the with<strong>in</strong>-sentence and<br />

experience-<strong>based</strong> variance <strong>in</strong> the human data extraord<strong>in</strong>arily well. There was a total fit<br />

<strong>of</strong> GPE and read<strong>in</strong>g times <strong>of</strong> R 2 = .75. In a hierarchical regression Wells and colleagues<br />

also predicted the SRN simulation results us<strong>in</strong>g the human read<strong>in</strong>g times. The RC experience<br />

group data accounted for 75% <strong>of</strong> overall variance <strong>in</strong> the SRNs’ GPE scores.<br />

Us<strong>in</strong>g the control group data as a predictor, which did not <strong>in</strong>volve an experience factor<br />

but merely with<strong>in</strong>-sentence variance, the regression accounted only for about 65% <strong>of</strong> the<br />

GPE variance.<br />

The impressive result <strong>of</strong> Wells et al.’s study delivers empirical support for the implications<br />

drawn by MC02 from the SRN simulations, namely that experience can account<br />

for <strong>in</strong>dividual differences <strong>in</strong> read<strong>in</strong>g skill. Notably, the Wells et al. study proved a significant<br />

experience effect after m<strong>in</strong>imal exposure amount (the tra<strong>in</strong><strong>in</strong>g sets conta<strong>in</strong>ed only<br />

160 sentences <strong>in</strong> total). Furthermore, lexical and structural short term prim<strong>in</strong>g effects<br />

were excluded by, first, a four weeks test distance, second, no lexical overlap between the<br />

tests and, third, the usage <strong>of</strong> RC constructions dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g that were structurally<br />

different from test<strong>in</strong>g items. Also task-related adaptation can be excluded s<strong>in</strong>ce dur<strong>in</strong>g<br />

the tra<strong>in</strong><strong>in</strong>g phase a full-sentence display was used <strong>in</strong>stead <strong>of</strong> word-by-word read<strong>in</strong>g.<br />

Now, given the evidence that read<strong>in</strong>g skill on certa<strong>in</strong> sentence structures is affected<br />

by previous experience with these structures it rema<strong>in</strong>s the question <strong>of</strong> what has been<br />

learned; a question that evolves from the granularity problem <strong>of</strong> exposure-<strong>based</strong> theories<br />

described <strong>in</strong> section 1.3.4. Wells and colleagues identify verb transitivity as a crucial<br />

factor driv<strong>in</strong>g the learn<strong>in</strong>g process <strong>of</strong> the SRN. In ORCs on the one hand, embedded<br />

verbs are necessarily transitive because the head noun has to fill an object role. On<br />

the other hand, simple sentences and SRCs occur with transitive and <strong>in</strong>transitive verbs.<br />

That has the consequence that only for cont<strong>in</strong>uation predictions <strong>of</strong> ORCs the network has<br />

to learn to differentiate verb transitivity <strong>in</strong> the predictions whereas for the cont<strong>in</strong>uation<br />

57


Length-Adjusted Read<strong>in</strong>g Time (ms)<br />

Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

Figure 1: Self-Paced Read<strong>in</strong>g Patterns at Pre- and Posttest<br />

175<br />

150<br />

125<br />

100<br />

75<br />

50<br />

25<br />

0<br />

-25<br />

-50<br />

-75<br />

1 2 3 4<br />

OR: (The) clerk that the typist tra<strong>in</strong>ed told the truth<br />

SR: (The) clerk that tra<strong>in</strong>ed the typist told the truth<br />

36<br />

Relative Clause <strong>Experience</strong><br />

RC <strong>Experience</strong> Group (n=32) Control <strong>Experience</strong> Group (n=32)<br />

Pretest, Object Relatives<br />

Pretest, Subject Relatives<br />

Posttest, Object Relatives<br />

Posttest, Subject Relatives<br />

175<br />

150<br />

125<br />

100<br />

75<br />

50<br />

25<br />

0<br />

-25<br />

-50<br />

-75<br />

1 2 3 4<br />

(The) clerk that the typist tra<strong>in</strong>ed told the truth<br />

(The) clerk that tra<strong>in</strong>ed the typist told the truth<br />

Figure 3.4: Wells et al. (2009) read<strong>in</strong>g times for pre- and post-test by group and RC<br />

type.<br />

<strong>of</strong> other structures transitive and <strong>in</strong>transitive verbs are <strong>of</strong> equal probability. Example<br />

(20) shows structural prefixes <strong>in</strong> ma<strong>in</strong> clauses (20a), SRCs (20b), and ORCs (20c) with<br />

possible transitivity properties <strong>of</strong> predicted verbs.<br />

(20) a. Simple: EOS the N . . . {trans/<strong>in</strong>trans}<br />

b. SRC: the N that . . . {trans/<strong>in</strong>trans}<br />

c. ORC: (the N) that the N . . . {trans}<br />

The experience effect <strong>of</strong> human readers, however, can be affected by a lot more structural<br />

cues <strong>in</strong> the <strong>in</strong>put. A crucial factor also mentioned for Ch<strong>in</strong>ese RCs <strong>in</strong> section 2.4<br />

is animacy. In English like <strong>in</strong> Mandar<strong>in</strong> ORCs mostly conta<strong>in</strong> <strong>in</strong>animate head nouns<br />

whereas <strong>in</strong> SRCs head nouns are commonly animate. S<strong>in</strong>ce Wells et al. only used<br />

animate head nouns for both SRCs and ORCs the participants might have learned to<br />

handle the non-canonical animate-headed ORCs. Race and MacDonald (2003), Reali<br />

and Christiansen (2007a), and Reali and Christiansen (2007b) identify further probabilistic<br />

constra<strong>in</strong>ts that correlate with SRC/ORC corpus distributions. For example<br />

pronom<strong>in</strong>al ORCs mostly conta<strong>in</strong> personal pronouns whereas impersonal pronouns occur<br />

more frequently <strong>in</strong> SRCs. Furthermore there are differences <strong>in</strong> the NP-type <strong>of</strong> the<br />

embedded subjects that separate ORCs from other structures that exhibit an ‘NP that<br />

NPSubj VP’ sequence. The usage <strong>of</strong> a pronoun <strong>in</strong> the NPSubj position is highly correlated<br />

with ORCs. The Wells et al. study only used common nouns and all RCs were<br />

headed by the impersonal pronoun that, which both are potential properties subject to<br />

probabilistic learn<strong>in</strong>g s<strong>in</strong>ce they are deviations from natural frequency patterns.<br />

58


3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

Relative Clause <strong>Experience</strong><br />

Figure 2. Comparison <strong>of</strong> the Human Self-Paced Read<strong>in</strong>g Patterns at Pre- and Post-test to the GPE Patterns Obta<strong>in</strong>ed for the SRNs.<br />

Figure 3.5: Wells et al. (2009) read<strong>in</strong>g times compared to model<strong>in</strong>g results <strong>of</strong> MacDonald<br />

and Christiansen (2002)<br />

The unnatural ORC patterns are probably the reason for an experience effect over<br />

such a short <strong>in</strong>terval <strong>of</strong> only 4 weeks. 37 Additionally, unnaturalness might have <strong>in</strong>flated<br />

the clause type effect <strong>in</strong> read<strong>in</strong>g times. I suspect that a study with completely natural<br />

sentences would show a smaller clause type difference and would need a strongly extended<br />

tra<strong>in</strong><strong>in</strong>g period to show experience effects. However, this small-scale study clearly proves<br />

the effect <strong>of</strong> structural probabilistic constra<strong>in</strong>ts on read<strong>in</strong>g skill. The comparability with<br />

the SRN simulations is given by the transitivity constra<strong>in</strong>t <strong>in</strong> ORCs which is a probable<br />

factor both <strong>in</strong> the simulations and <strong>in</strong> the human study.<br />

A Detailed Prediction Analysis<br />

The assumption that the SRN ma<strong>in</strong>ly learns the verb transitivity <strong>in</strong> ORCs is, however,<br />

only speculation. Konieczny and Ruh (2003) carried out a more detailed analysis <strong>of</strong><br />

the word-by-word predictions <strong>of</strong> MC02’s SRN. They found that the high GPE scores on<br />

the embedded verb <strong>in</strong> the ORC and the matrix verb <strong>in</strong> both RC types were caused by<br />

lexical misclassifications and local coherence. In particular, see<strong>in</strong>g the ORC sequence<br />

‘the N that the N’ the SRN predicts the end-<strong>of</strong>-sentence marker EOS <strong>in</strong> early tra<strong>in</strong><strong>in</strong>g.<br />

In later epochs the activation on EOS is reduced and <strong>in</strong>creased on the correct verbs.<br />

Notably, the predictions for <strong>in</strong>correct verbs (<strong>in</strong> this case all <strong>in</strong>transitive verb plus those<br />

with non-match<strong>in</strong>g number) do not change over tra<strong>in</strong><strong>in</strong>g. This pattern is <strong>in</strong>consistent<br />

with the transitivity hypothesis. The experience-<strong>based</strong> decrease <strong>of</strong> error on the ORC<br />

embedded verb is seem<strong>in</strong>gly not caused by learn<strong>in</strong>g the dist<strong>in</strong>ction between transitive<br />

and <strong>in</strong>transitive verbs but rather by learn<strong>in</strong>g to separate the pronoun that from verbs.<br />

The other region show<strong>in</strong>g effects <strong>of</strong> clause type and experience is the ma<strong>in</strong> verb. At<br />

59


Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

this po<strong>in</strong>t the SRN highly predicts a determ<strong>in</strong>er after see<strong>in</strong>g an ORC. This po<strong>in</strong>ts to a<br />

locally consistent <strong>in</strong>terpretation <strong>of</strong> the embedded ‘. . . the N Vtrans’ sequence as a ma<strong>in</strong><br />

clause prefix cont<strong>in</strong>u<strong>in</strong>g with an NP. This mis<strong>in</strong>terpretation is reduced <strong>in</strong> later tra<strong>in</strong><strong>in</strong>g<br />

epochs. An additional and very stable false prediction on the ma<strong>in</strong> verb is the EOS<br />

after embedded SRCs and ORCs. Concern<strong>in</strong>g the SRC this is consistent with a locally<br />

coherent <strong>in</strong>terpretation <strong>of</strong> the SRC sequence ‘. . . Vtrans the N’ as part <strong>of</strong> a ma<strong>in</strong> clause.<br />

In the ORC on the other hand, the EOS prediction after the ‘. . . the N Vtrans’ sequence<br />

is only locally consistent when <strong>in</strong>terpret<strong>in</strong>g the transitive verb as <strong>in</strong>transitive. This<br />

seems on first sight to be consistent with the assumption <strong>of</strong> Wells et al. (2009) that<br />

the SRN has to learn the trans/<strong>in</strong>trans difference. But surpris<strong>in</strong>gly the wrong EOS<br />

prediction <strong>in</strong>creases with further tra<strong>in</strong><strong>in</strong>g, <strong>in</strong>dicat<strong>in</strong>g that the network does not recognize<br />

the transitivity <strong>of</strong> the embedded verb. Summariz<strong>in</strong>g the analysis <strong>of</strong> Konieczny and Ruh,<br />

what causes the effects on embedded and ma<strong>in</strong> verb is a) the <strong>in</strong>terpretation <strong>of</strong> that as<br />

a verb, b) the prediction <strong>of</strong> the sentence to end after an embedded RC due to local<br />

coherence, and c) the failure to classify transitive and <strong>in</strong>transitive verbs. Konieczny and<br />

Ruh suggest to abandon verbs that can be both transitive and <strong>in</strong>transitive from the<br />

lexicon to separate the two classes more clearly. Furthermore the grammar should allow<br />

the use <strong>of</strong> pronom<strong>in</strong>al NPs to move the classification <strong>of</strong> that nearer to nouns than verbs.<br />

Concern<strong>in</strong>g the German RC simulations the explanation <strong>of</strong> the effects is quite simple.<br />

German SRCs and ORCs differ only <strong>in</strong> the serial order <strong>of</strong> the the relative pronoun and the<br />

determ<strong>in</strong>er <strong>of</strong> the embedded NP. Consequently the SRC conta<strong>in</strong>s <strong>in</strong> this region a NOM-<br />

ACC sequence whereas the SRC conta<strong>in</strong>s an ACC-NOM sequence. The embedded verb<br />

always agrees with the nom<strong>in</strong>ative (der). This produces a locally consistent structure <strong>of</strong><br />

‘detnom N V’ <strong>in</strong> the ORC but not <strong>in</strong> the SRC. Follow<strong>in</strong>g Konieczny and Ruh this local<br />

consistency effect produces the correct predictions for the embedded verb <strong>in</strong> the ORC,<br />

which is the reason for the lower error. In the SRC the verb is bound to the relative<br />

pronoun, which shares the number with the matrix subject. The SRN’s verb predictions,<br />

however, seem to be more <strong>in</strong>fluenced by the number <strong>of</strong> the <strong>in</strong>terven<strong>in</strong>g object than by<br />

the distant dependency.<br />

3.3.4 Summary<br />

In us<strong>in</strong>g SRNs MacDonald and Christiansen (2002) take advantage <strong>of</strong> a simple mechanism<br />

that, however, without architectural predesign makes excellent predictions concern<strong>in</strong>g<br />

the functional relation between exposure to certa<strong>in</strong> structures and process<strong>in</strong>g skill.<br />

The model’s behavior is <strong>in</strong>terpretable <strong>in</strong> terms <strong>of</strong> memory and decay, but due to its<br />

temporal loop and learn<strong>in</strong>g mechanism it sensitive to context and experience. The K<strong>in</strong>g<br />

and Just data was well fitted, especially for <strong>in</strong>dividual differences. The model results<br />

<strong>in</strong> comb<strong>in</strong>ation with the study by Wells et al. (2009) provide a comprehensive skillthrough-experience<br />

account that <strong>in</strong>cludes <strong>in</strong>dividual and language-specific differences.<br />

Konieczny and Ruh (2003) and others question the model’s validity, partly because a<br />

detailed analysis shows that learned constra<strong>in</strong>ts are <strong>of</strong> local nature and not comparable<br />

60


3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />

to human learn<strong>in</strong>g. However, Christiansen and Chater (1999) showed that an SRN is<br />

an outstand<strong>in</strong>g predictor, especially for complex embedd<strong>in</strong>g issues. Chapter 4 will now<br />

show useful SRN predictions on two further topics.<br />

61


Chapter 4<br />

Two SRN Prediction Studies<br />

This chapter is concerned with the connectionist simulation <strong>of</strong> the subject/object difference<br />

<strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong> English and German. As has been found<br />

<strong>in</strong> the previous chapters, detailed predictions <strong>of</strong> an experience-<strong>based</strong> account are important<br />

both the subject/object difference <strong>in</strong> Ch<strong>in</strong>ese and the language-specific forgett<strong>in</strong>g<br />

effect. For the forgett<strong>in</strong>g effect a structural experience account seems to be a promis<strong>in</strong>g<br />

predictor captur<strong>in</strong>g the divergence <strong>of</strong> the effects <strong>in</strong> different languages. Similarly, <strong>in</strong><br />

the Mandar<strong>in</strong> extraction preference question structural regularities are considered as an<br />

important explanatory factor. As discussed <strong>in</strong> 1.3, the problem <strong>of</strong> theories <strong>in</strong>volv<strong>in</strong>g<br />

structural experience or canonicity is the justification <strong>of</strong> certa<strong>in</strong> granularity commitments.<br />

Chapter 3 emphasized that an implementation <strong>of</strong> the experience account <strong>in</strong><br />

a connectionist network model deals with the granularity problem <strong>in</strong> a natural way,<br />

leav<strong>in</strong>g it to the learn<strong>in</strong>g process to extract structural <strong>in</strong>formation on the granularity<br />

level that best serves the solution <strong>of</strong> the task. The network model by MacDonald and<br />

Christiansen (2002) (MC02) discussed <strong>in</strong> chapter 3 as be<strong>in</strong>g a prototypical structural<br />

experience implementation has proven to make empirically consistent predictions regard<strong>in</strong>g<br />

the subject/object difference and <strong>in</strong>dividual differences. On that basis I rebuilt the<br />

model and used it to address the above mentioned issues that were wait<strong>in</strong>g for a connectionist<br />

answer. In order to verify my implementation <strong>of</strong> the model, I present replications<br />

<strong>of</strong> the English and German simulations before report<strong>in</strong>g the new simulations. But first<br />

I will briefly <strong>in</strong>troduce the structure <strong>of</strong> the network.<br />

4.1 The Model<br />

4.1.1 Network Architecture<br />

As <strong>in</strong> MC02 the connectionist architecture used was a simple recurrent network (SRN<br />

Elman, 1990) as described <strong>in</strong> chapter 3. All networks were built, tra<strong>in</strong>ed, and tested <strong>in</strong><br />

the Tlearn simulator (Elman, 1992) on a W<strong>in</strong>dows platform. The SRN consisted <strong>of</strong> four<br />

layers.<br />

62


4.1 The Model<br />

The Input Layer In a localist <strong>in</strong>put representation the number <strong>of</strong> <strong>in</strong>put nodes depends<br />

on the number <strong>of</strong> words <strong>in</strong> the lexicon. Each <strong>in</strong>put node represents one word.<br />

The representation <strong>of</strong> a word is encoded by activat<strong>in</strong>g the node represent<strong>in</strong>g the <strong>in</strong>put<br />

word with 1 and all others 0. The MC02 replication model used 31 <strong>in</strong>put nodes for 30<br />

words and the end-<strong>of</strong>-sentence marker (EOS).<br />

The Output Layer The output layer had the same amount <strong>of</strong> units as the <strong>in</strong>put layer.<br />

Output units, however, could take activations on a cont<strong>in</strong>uous scale between zero and<br />

one. The output calculation used the cross-entropy method which guarantees that all<br />

output activations sum to one. This method makes it possible to map output activations<br />

<strong>of</strong> units directly onto cont<strong>in</strong>uation probabilities.<br />

The Hidden Layer The hidden layer holds the <strong>in</strong>ternal representations <strong>of</strong> the network.<br />

It consisted <strong>of</strong> 60 units which receive an all-to-all connection from the <strong>in</strong>put units<br />

and connect <strong>in</strong> the same way to the output units. Depend<strong>in</strong>g on the weights <strong>of</strong> the<br />

<strong>in</strong>com<strong>in</strong>g connections hidden units received activations between zero and one. Over all<br />

simulations the hidden layer size was held constant. As Christiansen and Chater (1999)<br />

demonstrated with<strong>in</strong> layer sizes from about 15 units upward the number <strong>of</strong> hidden units<br />

does not significantly <strong>in</strong>fluence the performance <strong>of</strong> the network on recursive embedd<strong>in</strong>g.<br />

Hence, for the simulations presented here the size <strong>of</strong> the hidden and context layer stayed<br />

untouched.<br />

The Context Layer The context layer conta<strong>in</strong>ed 60 units that received a one-toone<br />

connection from the hidden units. To obta<strong>in</strong> the copy mechanism the l<strong>in</strong>k weights<br />

connect<strong>in</strong>g from the hidden layer were fixed to 1. In that way, <strong>in</strong> every time step the<br />

context units received an exact copy <strong>of</strong> the hidden layer’s activations. In an SRN the<br />

back-projection <strong>in</strong>to the hidden layer happens <strong>in</strong> an all-to-all fashion, thus provid<strong>in</strong>g the<br />

next <strong>in</strong>put step with <strong>in</strong>direct context from previous calculations.<br />

4.1.2 Grammar and Corpora<br />

Simple probabilistic context-free grammars (PCFGs) were used, cover<strong>in</strong>g simple sentences<br />

and subject- and object-extracted RCs <strong>in</strong> all three languages English, German<br />

and Ch<strong>in</strong>ese. For standard English and German tra<strong>in</strong><strong>in</strong>g I used the grammars designed<br />

by Lars Konieczny (English) and Daniel Müller and Lars Konieczny (German). The<br />

relative clause distribution was adjusted across different experiments. For generat<strong>in</strong>g<br />

corpora and likelihood predictions the Simple Language Generator (SLG Rohde, 1999)<br />

was used. The three tra<strong>in</strong><strong>in</strong>g grammars as represented <strong>in</strong> SLG can be found <strong>in</strong> the Appendix.<br />

Every tra<strong>in</strong><strong>in</strong>g corpus consisted <strong>of</strong> 10,000 randomly generated sentences. Test<br />

corpora were generated for every condition consist<strong>in</strong>g <strong>of</strong> 10 test sentences each.<br />

63


4.1.3 Tra<strong>in</strong><strong>in</strong>g and Test<strong>in</strong>g<br />

Chapter 4 Two SRN Prediction Studies<br />

Prior to tra<strong>in</strong><strong>in</strong>g, all networks were <strong>in</strong>itialized with random connection weights <strong>in</strong> the<br />

range <strong>of</strong> [-0.15, 0.15] and the hidden units received an <strong>in</strong>itial bias activation <strong>of</strong> 0.5. Each<br />

tra<strong>in</strong><strong>in</strong>g <strong>in</strong>cluded 10 <strong>in</strong>dividually <strong>in</strong>itialized networks that were tra<strong>in</strong>ed on 10 different<br />

corpora, respectively. In do<strong>in</strong>g so, statistical justification was achieved by simulat<strong>in</strong>g<br />

subjects <strong>of</strong> differ<strong>in</strong>g disposition exposed to non-identical material. The networks were<br />

tra<strong>in</strong>ed for three epochs, where one epoch corresponded to a full run through a corpus.<br />

In the follow<strong>in</strong>g, the tra<strong>in</strong><strong>in</strong>g mechanism is briefly described. With every <strong>in</strong>put word<br />

the weighted connections propagate activation and <strong>in</strong>hibition through the net, form<strong>in</strong>g<br />

an activation pattern <strong>in</strong> the hidden layer which <strong>in</strong> turn is responsible for the output<br />

pattern. The error is then calculated with respect to activation <strong>of</strong> only the node correspond<strong>in</strong>g<br />

to the subsequent word <strong>in</strong> the current sentence by value 1. Thus, the network<br />

is tra<strong>in</strong>ed to determ<strong>in</strong>istically predict the next word, which is, <strong>of</strong> course, impossible<br />

to achieve. Similar <strong>in</strong>put will have different cont<strong>in</strong>uations but the teach<strong>in</strong>g mechanism<br />

claims for every cont<strong>in</strong>uation to be the one and only for the current context. Consequently,<br />

after some examples the network will activate several words, <strong>in</strong> activation<br />

strength qualitatively correspond<strong>in</strong>g to the “teacher’s” previous s<strong>in</strong>gle lessons. In comb<strong>in</strong>ation<br />

with the cross-entropy error calculation (output activations will sum to 1) the<br />

activation distribution over output nodes will be comparable to a probability distribution<br />

over words. Here the grammatical prediction error (GPE, Christiansen and Chater,<br />

1999) comes <strong>in</strong>to play. The GPE algorithm is <strong>based</strong> on the numerical differences between<br />

the desired PCFG-correspond<strong>in</strong>g probabilities and the actual output. The GPE value is<br />

a difficulty measure for every word <strong>in</strong> the sentence, which can be used as a read<strong>in</strong>g time<br />

predictor.<br />

The Grammatical Prediction Error The GPE measure as described <strong>in</strong> Christiansen<br />

and Chater (1999) assigns an error score between 0 and 1 to every output<br />

activation pattern. With all correct units receiv<strong>in</strong>g the correct amount <strong>of</strong> activation<br />

and no <strong>in</strong>correct units be<strong>in</strong>g activated the output would receive a GPE value <strong>of</strong> 0. If,<br />

on the other hand, no units are correctly activated, the GPE would be 1. The score<br />

is calculated by formula (4.1–4.5) with ui be the activation <strong>of</strong> unit i; G and U be the<br />

sets <strong>of</strong> grammatical and ungrammatical units, respectively; and ti be the desired target<br />

activation <strong>of</strong> unit i. The formula ensures that <strong>in</strong>correctly or <strong>in</strong>sufficiently activated<br />

output units get penalized by <strong>in</strong>creas<strong>in</strong>g score. S<strong>in</strong>ce all activations sum to 1, also<br />

over-activation is penalized because this too much activation is miss<strong>in</strong>g at another unit.<br />

Correctly activated units H (hits) and <strong>in</strong>correctly activated units F (false alarms) sum<br />

together to the total activation. Additionally, M is the sum <strong>of</strong> all miss<strong>in</strong>g activation mi,<br />

which is the discrepancy <strong>of</strong> an under-predicted unit’s activation ui with respect to the<br />

target activation ti. The target activation t <strong>of</strong> a unit i is given by the likelihood <strong>of</strong> the<br />

respective word <strong>in</strong> the current context str<strong>in</strong>g calculated from the probabilistic grammar.<br />

The GPE is then def<strong>in</strong>ed by the proportion <strong>of</strong> correctly activated units (H) to the total<br />

64


4.2 Replication <strong>of</strong> Previous Simulations<br />

activation and a penalty for misses (4.5).<br />

Hits (correctly activated units): H = <br />

False Alarms (<strong>in</strong>correctly activated units): F = <br />

mi =<br />

i∈Gui<br />

i∈U ui<br />

<br />

0 if ti − ui ≤ 0<br />

ti − ui otherwise<br />

Misses (units with underestimated activation): M = <br />

i∈G(H + F )mi<br />

H<br />

GP E = 1 −<br />

H + F + M<br />

(4.1)<br />

(4.2)<br />

(4.3)<br />

(4.4)<br />

(4.5)<br />

A Perl rout<strong>in</strong>e controlled tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g <strong>of</strong> the ten networks and then calculated<br />

the region-specific GPEs. The correct function<strong>in</strong>g <strong>of</strong> the process will now be validated<br />

by the replication <strong>of</strong> two previous studies.<br />

4.2 Replication <strong>of</strong> English and German RC Process<strong>in</strong>g<br />

I built the model with the parameters specified by MacDonald and Christiansen (2002)<br />

and tried to replicate their results (see figure 3.3 for MC02’s results). MC02 report an<br />

RC probability <strong>of</strong> 0.05. However, the replication fitted their data better when the RC<br />

probability was set to 0.1. Konieczny and Ruh (2003) also replicated MC02 with an RC<br />

probability <strong>of</strong> 0.1. Figure 4.1 shows the replication result. The pattern <strong>of</strong> MC02 was<br />

more exactly matched <strong>in</strong> epochs 3, 4, and 5 but the relevant <strong>in</strong>teractions were also found<br />

<strong>in</strong> epochs 1, 2, and 3. Only the tra<strong>in</strong><strong>in</strong>g effect on the ma<strong>in</strong> verb <strong>in</strong> the ORC was not<br />

very pronounced. The differences were, however, significant.<br />

I used the simplified German grammar from Konieczny and Ruh (2003) to replicate<br />

their results. Compared to the orig<strong>in</strong>al study I ga<strong>in</strong>ed lower error rates for the ma<strong>in</strong> verb<br />

<strong>in</strong> both conditions. Additionally the replication showed a significant experience effect <strong>in</strong><br />

all regions <strong>of</strong> the SRC, which was not the case <strong>in</strong> the orig<strong>in</strong>al. The pattern by region<br />

was successfully matched.<br />

I will not go <strong>in</strong>to details regard<strong>in</strong>g the two replication studies. They just build the<br />

basis for the follow<strong>in</strong>g simulations, mak<strong>in</strong>g sure that the model used here has similar<br />

properties as the models <strong>in</strong> MacDonald and Christiansen (2002) and Konieczny and Ruh<br />

(2003).<br />

65


GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English SRC<br />

rep. that attacked the senator praised the judge<br />

Region<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

Chapter 4 Two SRN Prediction Studies<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English ORC<br />

rep. that the sen. attacked praised the judge<br />

Region<br />

Figure 4.1: Replication <strong>of</strong> MacDonald and Christiansen (2002)<br />

4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />

4.3.1 Simulation 1: Regularity<br />

Model Parameters<br />

The first simulation should assess the degree <strong>of</strong> the regularity advantage the ORC receives<br />

due to its canonical word order. A regularity effect is assessable only when the frequencies<br />

<strong>of</strong> both RC types <strong>in</strong> the corpora are identical. Therefore the SRC and the ORC received<br />

the same probability <strong>in</strong> the generation grammar. Although the replications were done<br />

with an RC probability <strong>of</strong> 0.1, I used the orig<strong>in</strong>al value <strong>of</strong> 0.05, reported <strong>in</strong> MacDonald<br />

and Christiansen (2002), for the Mandar<strong>in</strong> regularity simulation. Compared to English<br />

the Ch<strong>in</strong>ese grammar used here is very simple. Sett<strong>in</strong>g the RC probability too high<br />

could speed up <strong>in</strong> the learn<strong>in</strong>g process <strong>in</strong> a way that conceals tra<strong>in</strong><strong>in</strong>g effects.<br />

The Grammar The grammar used to generate the corpora covered simple regular<br />

Mandar<strong>in</strong> SVO sentences as well as SR and OR clauses. Relative clause attachment<br />

could happen at every noun with a probability <strong>of</strong> 0.05. The embedd<strong>in</strong>g depth was<br />

theoretically unlimited but with the small attachment probability <strong>of</strong> 0.05 the longest<br />

sentence <strong>in</strong> the corpora had a length <strong>of</strong> 16 words. The 17-word lexicon consisted <strong>of</strong> 9<br />

plural and s<strong>in</strong>gular nouns, three transitive and four <strong>in</strong>transitive verbs, <strong>of</strong> which one (lijie<br />

“understand”) belongs to both categories, the relativizer de and the EOS. Note however,<br />

that there is no number agreement between nouns and verbs <strong>in</strong> Mandar<strong>in</strong>. The full<br />

lexicon is given <strong>in</strong> the Appendix. Note further that <strong>in</strong> normal Mandar<strong>in</strong> <strong>in</strong>transitive<br />

66<br />

epoch 1<br />

epoch 2<br />

epoch 3


4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

German SRC<br />

der den Passant trifft verspottet<br />

Region<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

German ORC<br />

den der Passant trifft verspottet<br />

Region<br />

Figure 4.2: Replication <strong>of</strong> Konieczny and Ruh (2003)<br />

ma<strong>in</strong> clauses are closed by the marker le (see example 21). I, however, did not use it <strong>in</strong><br />

the grammar to not blur the transitivity constra<strong>in</strong>ts s<strong>in</strong>ce le could seem to the network<br />

as a noun.<br />

(21) faguan youyu le .<br />

Ten networks were tra<strong>in</strong>ed on ten randomly generated corpora. Three test corpora<br />

were randomly generated conta<strong>in</strong><strong>in</strong>g simple transitive SVO sentences, SRCs, and ORCs,<br />

respectively. The respective RC test corpora only conta<strong>in</strong>ed s<strong>in</strong>gly embedded RCs with<br />

transitive verbs. It was made sure that none <strong>of</strong> the test sentences appeared <strong>in</strong> the<br />

tra<strong>in</strong><strong>in</strong>g sets.<br />

(22) Test set examples:<br />

a. gongji fayanren de lvshi biaoyang lvshimen . (SRC)<br />

b. fayanren lijie de lvshi biaoyang y<strong>in</strong>hangjia . (ORC)<br />

The SRN’s task was to predict the next word <strong>in</strong> a sentence. For example <strong>in</strong> an SRC,<br />

when the verb biaoyang “praise” was seen and now the noun lushi “lawyer” was activated<br />

the target activation for the relativizer de is 0.975. The activation pattern is shown <strong>in</strong><br />

figure 4.3. Besides the relativizer the transitive verbs are expected to be activated. The<br />

reason is that there is a low but existent possibility <strong>of</strong> an object modify<strong>in</strong>g ORC <strong>in</strong>side<br />

the SRC (see example 23). This is <strong>in</strong> fact the only possible cont<strong>in</strong>uation apart from de<br />

follow<strong>in</strong>g a ‘V N’ sequence. Contrary to that <strong>in</strong> the ORC follow<strong>in</strong>g the first two words<br />

‘N V’ all nodes apart from the EOS are activation targets because a lot <strong>of</strong> different<br />

cont<strong>in</strong>uations are possible.<br />

67<br />

epoch 1<br />

epoch 2<br />

epoch 3


(23) [V1 [N1 V2 de ORC ] N2 de SRC ] N3<br />

Chapter 4 Two SRN Prediction Studies<br />

Rec-Act Target-Act mi Ch<strong>in</strong>. Engl.<br />

0.688 0 0 (EOS) (EOS)<br />

0.003 0.00833333 0.00533333 (biaoyang) (praise)<br />

0.000 0 0 (dadianhua) (phone)<br />

0.713 0.975 0.262 (de) (gen)<br />

0.001 0 0 (faguan) (judge)<br />

0.001 0 0 (faguanmen) (judges)<br />

0.001 0 0 (fayanren) (reporter/reporters)<br />

0.003 0.00833333 0.00533333 (gongji) (attack)<br />

0.001 0 0 (guanyuan) (senator)<br />

0.001 0 0 (guanyuanmen) (senators)<br />

0.002 0.00833333 0.00633333 (lijie) (understand)<br />

0.001 0 0 (lushi) (lawyer)<br />

0.001 0 0 (lushimen) (lawyers)<br />

0.000 0 0 (sahuang) (lie)<br />

0.001 0 0 (y<strong>in</strong>hangjia) (banker)<br />

0.001 0 0 (y<strong>in</strong>hangjiamen) (bankers)<br />

0.000 0 0 (youyu) (hesitate)<br />

H: 0.721, F: 0.697, GPE: 0.602452988750017<br />

Figure 4.3: Output node activations on the relativizer <strong>in</strong> Mandar<strong>in</strong> after the <strong>in</strong>put<br />

sequence ‘biaoyang lushi’.<br />

Predictions<br />

The aim <strong>of</strong> the first simulation was to verify the regularity (or canonicity) argument for<br />

Mandar<strong>in</strong> object relatives. As Hsiao and Gibson (2003) have stated, ORCs have a more<br />

canonical word order than SRCs and should therefore be easier. The expected outcome<br />

would be a reversed pattern to the English and German results. The performance on<br />

ORC should be relatively good and stable from the first tra<strong>in</strong><strong>in</strong>g cycle on, whereas the<br />

SRC performance should be subject to improvement throughout tra<strong>in</strong><strong>in</strong>g.<br />

Results<br />

Words 2 to 5 (N1/V1 de N2 V2 N3) were selected as regions <strong>of</strong> <strong>in</strong>terest. S<strong>in</strong>ce the<br />

assessed performance is prediction, there is no po<strong>in</strong>t <strong>in</strong> test<strong>in</strong>g the <strong>in</strong>itial word <strong>in</strong> a<br />

sentence. The network’s prediction for the <strong>in</strong>itial word will always be the same and the<br />

difference to the desired prediction will not tell us anyth<strong>in</strong>g about structural matters.<br />

68


4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />

The GPE score measured on a certa<strong>in</strong> word only tells us how the predictions <strong>based</strong><br />

on previous words fit the probabilistic grammar. It does not <strong>in</strong>clude any effect <strong>of</strong> the<br />

current word itself.<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Mandar<strong>in</strong> SRC<br />

N1 de N2 V2 N3<br />

Region<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Mandar<strong>in</strong> ORC<br />

V1 de N2 V2 N3<br />

Region<br />

Figure 4.4: Simulation 1: Mandar<strong>in</strong> ORC regularity.<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

See figure 4.4 for GPE scores <strong>of</strong> SRCs and ORCs by tra<strong>in</strong><strong>in</strong>g epochs. For means<br />

and standard errors see table A.1 <strong>in</strong> the appendix. Collaps<strong>in</strong>g over all regions and<br />

epochs there was a significant advantage for object relatives. The difference shrank with<br />

<strong>in</strong>creas<strong>in</strong>g epochs. For the ORC there was significant improvement on the ma<strong>in</strong> verb<br />

over the three epochs. The SRC improved on the ma<strong>in</strong> verb and the relativizer. On<br />

the first region (N1/V1) there was a marg<strong>in</strong>al advantage for the ORC <strong>in</strong> the first epoch.<br />

The second region (de) showed a significant object advantage <strong>in</strong> all epochs. There was<br />

also an object advantage on region 4 (V2), which, however, disappeared after the second<br />

epoch due to SRC improvement. Region three and five did not show any effect.<br />

The results <strong>of</strong> experiment 1 showed the predicted frequency × regularity <strong>in</strong>teraction.<br />

In contrast to the English results by MC02 the regularity effect is seen <strong>in</strong> object relatives<br />

<strong>in</strong> Mandar<strong>in</strong>. The effect, however, is is not located on the embedded RC but ma<strong>in</strong>ly on<br />

the relativizer. It seems as the predictions for position 4 (here the relativizer) are easier<br />

for the ORC because <strong>of</strong> the familiarity with the sequence ‘N V . . . ’ where the relativizer<br />

should have a quite low cont<strong>in</strong>uation probability due to the small RC frequency <strong>in</strong> the<br />

corpus. On the other hand, the SRC sequence ‘V N’ is very rarely occurr<strong>in</strong>g at the<br />

sentence beg<strong>in</strong>n<strong>in</strong>g, mak<strong>in</strong>g more tra<strong>in</strong><strong>in</strong>g necessary to learn the correct predictions.<br />

Over tra<strong>in</strong><strong>in</strong>g the network has to learn to assign a high activation to the relativizer after<br />

an ‘V N’ and to exclude almost all other words as a cont<strong>in</strong>uation.<br />

Experiment 1 superficially confirms the ORC regularity hypothesis. However, as the<br />

69


Chapter 4 Two SRN Prediction Studies<br />

corpus study by Kuo and Vasishth (2007) revealed there are a lot more structures <strong>in</strong><br />

the corpus that resemble the SRC typical ‘V N de N’ sequence. It has to be recognized<br />

that the granularity problem also accounts for connectionist networks, namely <strong>in</strong> the<br />

choice <strong>of</strong> <strong>in</strong>put structures. Assume a structure be regular with respect to ma<strong>in</strong> clauses;<br />

but then, tak<strong>in</strong>g <strong>in</strong>to account a great number <strong>of</strong> structures which fundamentally differ<br />

from the ma<strong>in</strong> clause, their word order has considerable <strong>in</strong>fluence on the respective<br />

regularity. German for example is considered to bear an SOV regularity although most<br />

ma<strong>in</strong> clauses are built with SVO. Experiment 2 will assess the qualitative impact <strong>of</strong> a<br />

displaced regularity <strong>in</strong> favor <strong>of</strong> the SRC structure.<br />

4.3.2 Simulation 2: Frequency<br />

Model Parameters<br />

Hsiao and Gibson (2003) performed a corpus study yield<strong>in</strong>g only a small difference <strong>in</strong><br />

the occurrences <strong>of</strong> subject vs. object relatives. Of all RCs they found about 57.5% were<br />

SRCs and 42.5% were ORCs. However, as reported <strong>in</strong> 2.2 there are more constructions<br />

with an RC-like pattern that are <strong>in</strong> fact not RCs. The purely syntactic account <strong>of</strong> structural<br />

frequency addressed here does not dist<strong>in</strong>guish between homomorphic sequences<br />

yield<strong>in</strong>g different <strong>in</strong>terpretations. Consequently, the corpus frequencies <strong>of</strong> all RC-like<br />

structures will be considered <strong>in</strong> this experiment. The corpus study by Kuo and Vasishth<br />

(2007), found 639 occurrences SRC-like sequences like ‘V N1 de N2’ and only 117 <strong>of</strong> ‘N1<br />

V de N2’ (ORC). That makes a total <strong>of</strong> 756 RC-like structures, <strong>of</strong> which about 84.5%<br />

exhibit the SRC pattern. In addition the possibility <strong>of</strong> ma<strong>in</strong> clauses with elided subjects<br />

is considered as an <strong>in</strong>fluence on early pars<strong>in</strong>g decisions (Kuo and Vasishth, 2007). This<br />

could further <strong>in</strong>crease the familiarity with ‘V N . . . ’ sequences and hence yield a facilitation<br />

on the SRC. To account for a high number <strong>of</strong> SRC-like structures the probability<br />

<strong>of</strong> RC attachment <strong>in</strong> the grammar was set to 0.1 and the SRC probability was set to<br />

0.85. In that way the number <strong>of</strong> ORCs <strong>in</strong> the corpus was only slightly lowered whereas<br />

the number <strong>of</strong> SRCs <strong>in</strong>creased by about 60%. Extra ma<strong>in</strong> clauses with miss<strong>in</strong>g subjects<br />

were not added to the grammar. No further changes were made.<br />

Predictions<br />

The implemented discrepancy between ORC and SRC frequency should <strong>in</strong> pr<strong>in</strong>ciple<br />

account for the structural properties <strong>in</strong> the corpus, decreas<strong>in</strong>g the familiarity effect on<br />

ORCs. The SRC pattern should be easier predictable on the embedded noun and the<br />

relativizer. Thus, the object advantage seen <strong>in</strong> simulation 1 should decrease. The size<br />

<strong>of</strong> the effect is expected to be small because the distributional changes regard<strong>in</strong>g RC<br />

probability are just a statistical approximation <strong>of</strong> given corpus data.<br />

70


4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />

Results<br />

Figure 4.5 shows the GPE scores by region and epoch for SRCs and ORCs. Table<br />

A.2 (<strong>in</strong> the appendix) shows means and standard error for the first two regions. The<br />

improvement on the ma<strong>in</strong> verb <strong>in</strong> the ORC over epochs was comparably low. There was<br />

no improvement on the SRC ma<strong>in</strong> verb. The tra<strong>in</strong><strong>in</strong>g improvement on the relativizer<br />

<strong>in</strong> SRC happened predom<strong>in</strong>antly between the first and the second epoch. An object<br />

advantage on the relativizer was only present <strong>in</strong> the first and slightly <strong>in</strong> the second epoch.<br />

The third epoch did not reveal an ORC advantage. In addition a subject preference was<br />

found on the pre-relativizer region <strong>in</strong> the second and third epoch (P < 0.001).<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Mandar<strong>in</strong> SRC (85%)<br />

N1 de N2 V2 N3<br />

Region<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Mandar<strong>in</strong> ORC (25%)<br />

V1 de N2 V2 N3<br />

Region<br />

Figure 4.5: Simulation 2: Mandar<strong>in</strong> SRC Frequency<br />

epoch 1<br />

epoch 2<br />

epoch 3<br />

The heavily improved predictions for the relativizer <strong>in</strong> the SRC imply an <strong>in</strong>creased<br />

familiarity effect due to the corpus conta<strong>in</strong><strong>in</strong>g more SRCs than before. The result <strong>of</strong><br />

simulation 2 suggests that the regularity effect on object relatives is weak and can easily<br />

be affected by a slight distributional disproportion <strong>in</strong> favor <strong>of</strong> the SRC structure.<br />

4.3.3 Discussion<br />

Simulation 1 confirms the regularity advantage for Mandar<strong>in</strong> ORCs with respect to<br />

simple sentences with the greatest effect on the relativizer. However, the location <strong>of</strong> the<br />

effect is not consistent with human data. Recall that Hsiao and Gibson (2003) found an<br />

object preference on the pre-relativizer region whereas Kuo and Vasishth (2007) found a<br />

subject preference on the relativizer and the head noun. Besides the very small effect on<br />

N1/V1 <strong>in</strong> the simulation there is no region-specific consistency with empirical studies.<br />

71


Chapter 4 Two SRN Prediction Studies<br />

The impact <strong>of</strong> structural regularity is rather disconfirmed by several studies f<strong>in</strong>d<strong>in</strong>g a<br />

subject advantage on the relativizer.<br />

By chang<strong>in</strong>g the RC type proportions <strong>in</strong> favor <strong>of</strong> the SRC <strong>in</strong> simulation 2 the object<br />

advantage decreased dramatically. The RC region showed a subject advantage after two<br />

tra<strong>in</strong><strong>in</strong>g epochs. Compared to human data this f<strong>in</strong>d<strong>in</strong>g is also <strong>in</strong>consistent. In empirical<br />

studies, on the RC region only an object preference was found (Hsiao and Gibson, 2003;<br />

L<strong>in</strong> and Garnsey, 2007; Qiao and Forster, 2008). The assessment <strong>of</strong> frequency effects<br />

<strong>in</strong> simulation 2 is to be understood as a tentative approach to account for the complex<br />

<strong>in</strong>terplay <strong>of</strong> statistical constra<strong>in</strong>ts that drive learn<strong>in</strong>g. Direct predictions for sentence<br />

process<strong>in</strong>g patterns may, however, not be justified. An SRN-<strong>based</strong> regularity test like<br />

<strong>in</strong> simulation 1 is more or less straightforward as long as the structures <strong>in</strong> question are<br />

clearly def<strong>in</strong>ed. But the structural choice may not reflect the regularity relations that<br />

are really <strong>in</strong>fluenc<strong>in</strong>g skill <strong>in</strong> human readers. In order to ga<strong>in</strong> more precise predictions,<br />

further corpus <strong>in</strong>spections are necessary. For example, the exact proportion <strong>of</strong> RClike<br />

structures or elided-subject clauses with respect to the whole corpus was neglected<br />

dur<strong>in</strong>g the present study but could potentially have <strong>in</strong>fluenced the results.<br />

Note that the regularity pattern <strong>in</strong> Mandar<strong>in</strong> as revealed by the simulations is not<br />

easily comparable with the English simulation. In English there were effects <strong>of</strong> difficulty<br />

ma<strong>in</strong>ly occurr<strong>in</strong>g on the verbs. This is due to the number agreement between subject<br />

and predicate. Notably, between the verb and its direct object no agreement is necessary.<br />

This agreement pattern delivers as a side effect a sort <strong>of</strong> semantic <strong>in</strong>formation,<br />

comparable to thematic roles. Therewith, agreement gives rise to a simulation <strong>of</strong> <strong>in</strong>tegration<br />

difficulty effects, evolv<strong>in</strong>g from the need to relate verbs to their subject. I<br />

hypothesize that these “<strong>in</strong>tegration effects” are the ma<strong>in</strong> reason for the nice fit by region<br />

<strong>of</strong> human data. Mandar<strong>in</strong>, on the other hand, does not conta<strong>in</strong> specific noun-verb<br />

dependencies. In a sense, the network just needs to count nouns and verbs <strong>in</strong>stead <strong>of</strong><br />

establish<strong>in</strong>g pairwise relationships. Thus, the Mandar<strong>in</strong>-tra<strong>in</strong>ed network is not required<br />

to deal with the concept <strong>of</strong> a sentential subject. Consequently, no “<strong>in</strong>tegration difficulty”<br />

comparable to English is expected. Of course, this is not comparable to human process<strong>in</strong>g<br />

<strong>of</strong> Mandar<strong>in</strong>. Predicates and their arguments are <strong>in</strong>deed <strong>in</strong>volved <strong>in</strong> dependencies<br />

like thematic roles and other semantic relationships. It is imag<strong>in</strong>able that the miss<strong>in</strong>g <strong>of</strong><br />

specific noun-verb relationships is the reason for the absent pattern match between the<br />

Mandar<strong>in</strong> simulation and human data. An implementation <strong>of</strong> the miss<strong>in</strong>g dependencies<br />

similar to the simplified English grammar seems straightforward to test that hypothesis.<br />

The <strong>in</strong>terpretability <strong>of</strong> the result <strong>of</strong> such a simulation would, however, be questionable.<br />

A possible <strong>in</strong>terpretation <strong>of</strong> the all-over contradictory simulation results with respect<br />

to human data is that the effects observed here do <strong>in</strong> fact not reflect experience-relevant<br />

regularities. Assum<strong>in</strong>g on the other hand, that the simulation results are <strong>in</strong>terpretable as<br />

show<strong>in</strong>g regularity properties that play a role <strong>in</strong> human sentence comprehension, there<br />

are two possible <strong>in</strong>terpretations <strong>of</strong> the results: a) Assum<strong>in</strong>g regularity plays a role <strong>in</strong> the<br />

extraction preference, the fact that the regularity effect on the ORC was very weak <strong>in</strong><br />

the simulations could be one <strong>of</strong> the reasons for the <strong>in</strong>conclusive empirical results. b) On<br />

72


4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />

the other hand it is possible that regularity does not have a relevant impact on empirical<br />

studies <strong>of</strong> Mandar<strong>in</strong> extraction preferences and the explanation is left to other factors.<br />

4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />

4.4.1 The model<br />

As presented <strong>in</strong> chapter 3, the forgett<strong>in</strong>g effect <strong>in</strong> center-embedded structures was addressed<br />

<strong>in</strong> a connectionist study by Christiansen and Chater (1999). They tra<strong>in</strong>ed an<br />

SRN on right-branch<strong>in</strong>g and center-embedd<strong>in</strong>g structures and then assessed the output<br />

node activations after see<strong>in</strong>g the sequence NNNVV. The activations showed a clear 2VP<br />

preference consistent with empirical data from English speakers. The artificial language<br />

that covered center-embedd<strong>in</strong>g abba and right-branch<strong>in</strong>g aabb dependency patterns is<br />

perfectly comparable to the simple English grammar <strong>of</strong> object and subject relative<br />

clauses used by MacDonald and Christiansen (2002). Thus, it should be possible to<br />

replicate the effect with the SRNs tra<strong>in</strong>ed on the English grammar for the replication<br />

<strong>in</strong> section 4.2. In German RCs, however, no real right-branch<strong>in</strong>g occurs, given the embedded<br />

RC is always attached to its head noun. Hence, <strong>in</strong> the German grammar used<br />

<strong>in</strong> section 4.2 both ORC and SRC exhibit a center-embedd<strong>in</strong>g abba pattern. This fact<br />

could result <strong>in</strong> the SRN exposed to a German grammar be<strong>in</strong>g more tra<strong>in</strong>ed on verb-f<strong>in</strong>al<br />

center-embedd<strong>in</strong>g structures than the English counterpart result<strong>in</strong>g <strong>in</strong> different predictions<br />

for an NNNVV sequence. Suppos<strong>in</strong>g that the difference <strong>in</strong> SRC realization <strong>in</strong><br />

the corpora approximately reflects an essential word order regularity difference between<br />

German and English, the SRN predictions will shed light on the part that experience<br />

plays <strong>in</strong> the explanation for the forgett<strong>in</strong>g effect.<br />

I extended the study by Christiansen and Chater (1999) to ga<strong>in</strong> GPE values for both<br />

conditions on all regions after the miss<strong>in</strong>g verb. In order to achieve that, it was necessary<br />

to have a grammar that simulates the forgett<strong>in</strong>g effect, hence allows NNNVV sequences<br />

to be complete. Thus, <strong>in</strong> the probability table for the drop-V2 test<strong>in</strong>g corpus the column<br />

referr<strong>in</strong>g to the position <strong>of</strong> V2 was deleted. In consequence the test<strong>in</strong>g probabilities<br />

were adequate to a ‘N1 N2 N3 V3 V1’ grammar with the first verb (V3) be<strong>in</strong>g bound<br />

to N1 by number agreement and the second verb (V1) to N3. This is equivalent to<br />

forgett<strong>in</strong>g the prediction <strong>in</strong>duced by N2. The GPE for the ungrammatical conditions was<br />

calculated aga<strong>in</strong>st these drop-V2 probabilities. So, if the network is mak<strong>in</strong>g grammatical<br />

predictions, the error values for V1 and subsequent regions should be higher <strong>in</strong> the<br />

drop-V2 condition. On N1 the SRN would predict a verb <strong>in</strong> number agreement with<br />

N2. Then the network would predict another verb, but the test grammar predicts the<br />

determ<strong>in</strong>er. After this po<strong>in</strong>t the network’s predictions should be completely confused<br />

because the just observed sequence is <strong>in</strong>consistent with any structural generalizations<br />

developed dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. If the networks predictions are not too locally dependent,<br />

the predictions should be wrong for the last word (direct object <strong>of</strong> the ma<strong>in</strong> clause), too.<br />

73


Chapter 4 Two SRN Prediction Studies<br />

However, assum<strong>in</strong>g the forgett<strong>in</strong>g hypothesis the GPE values would look differently.<br />

The forgett<strong>in</strong>g hypothesis would mean for the SRN that it is unable to make correct<br />

predictions <strong>based</strong> on long distant dependencies but bases its predictions on rather locally<br />

consistent sequences. For example after see<strong>in</strong>g V3 the network only predicts one more<br />

verb because the observation <strong>of</strong> N1 is too weakly encoded <strong>in</strong> the hidden representations<br />

to <strong>in</strong>fluence the predictions. Consequently, on V1 the error for the drop-V2 condition<br />

should be lower because <strong>in</strong> the grammatical condition V1 is the third verb which is<br />

<strong>in</strong>consistent with the SRN’s predictions. The 2VP preference should cont<strong>in</strong>ue on the<br />

post-V1 regions because a locally coherent context with two verbs is easier to handle<br />

than a context <strong>of</strong> three verbs.<br />

Vasishth et al. (2008) mentioned the potential factor <strong>of</strong> comma <strong>in</strong>sertions that could<br />

serve as structural cues alert<strong>in</strong>g the reader <strong>of</strong> a miss<strong>in</strong>g verb. However, empirically it<br />

is hardly possible to separate the comma effect from word order effects. Vasishth and<br />

colleagues <strong>in</strong>deed tested English readers on comma-conta<strong>in</strong><strong>in</strong>g stimuli; but s<strong>in</strong>ce English<br />

readers are not tra<strong>in</strong>ed on commas used <strong>in</strong> this way, they are unable to draw as much<br />

<strong>in</strong>formation from comma positions as German readers do. In order to test whether the<br />

commas <strong>in</strong> fact <strong>in</strong>fluence structural predictions, the follow<strong>in</strong>g study tested SRNs tra<strong>in</strong>ed<br />

on German and English corpora both with and without commas.<br />

4.4.2 Simulation 3: English<br />

Model Parameters<br />

For the forgett<strong>in</strong>g effect simulation <strong>of</strong> English without commas (simulation 3a) no new<br />

tra<strong>in</strong><strong>in</strong>g was necessary. The SRNs tra<strong>in</strong>ed on the English corpora were tested on the<br />

grammatical and the ungrammatical condition <strong>in</strong> their state after one, two, and three<br />

epochs. For simulation 3b the English grammars for the tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g corpora<br />

were enriched with commas and the SRNs were tra<strong>in</strong>ed and tested <strong>in</strong> the usual way.<br />

3a: English without commas<br />

Accord<strong>in</strong>g to the equivalence <strong>of</strong> Christiansen and Chater’s tra<strong>in</strong><strong>in</strong>g language and the<br />

English tra<strong>in</strong><strong>in</strong>g grammar used here, the effects should be similar. In particular, the<br />

GPE values for the V1 and post-V1 regions should receive lower GPE values <strong>in</strong> the<br />

drop-V2 condition.<br />

(24) Example test sentences:<br />

a. the judge that the reporters that the senators understand praise attacked<br />

the senators . (no-drop)<br />

b. the judges that the reporters that the lawyer praised attacked the senators .<br />

(drop-V2)<br />

74


4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />

Results for 3a<br />

Similar to the experiment <strong>in</strong> Vasishth et al. (2008) the assessed regions <strong>in</strong> the simulation<br />

were the three verbs V3, V2, V1 and the post-V1 region. The V2 region conta<strong>in</strong>s<br />

is no datapo<strong>in</strong>t <strong>in</strong> the ungrammatical condition because the verb is dropped <strong>in</strong> the<br />

test<strong>in</strong>g stimuli. Figure 4.6 shows GPE values for the SRNs tra<strong>in</strong>ed and tested on the<br />

simplified English grammar without commas. The left panel shows both conditions after<br />

one tra<strong>in</strong><strong>in</strong>g epoch. On the right results after three full corpus runs are shown. The<br />

ungrammatical condition is called drop-V2 and the grammatical is <strong>in</strong>dicated by nodrop.<br />

The pattern was exactly as expected. The SRNs predicted a drop-V2 advantage<br />

on V1 and post-V1. No effect was predicted on V3 because no difference <strong>in</strong> stimuli and<br />

probability between the conditions is present at this po<strong>in</strong>t.<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English without commas (epoch 2)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English without commas (epoch 3)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

Figure 4.6: Simulation 3a: English doubly-embedded ORC. The graphic shows the<br />

GPE value on the three verbs and the subsequent region <strong>of</strong> the grammatical (nodrop)<br />

and ungrammatical (drop-V2) condition. The left panel shows GPE after two<br />

epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g, the right panel shows GPE after 3 epochs.<br />

3b: English with commas<br />

The commas serve as clause boundary markers. They appear <strong>in</strong> English SRCs subsequent<br />

to nouns only. In the ORC, on the other hand, commas appear after nouns <strong>in</strong><br />

the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the sentence and after the verbs <strong>in</strong> the end. In a double-embedded<br />

ORC there would be a comma after V3 and V2. Thus, the grammatical/ungrammatical<br />

sequence pair is not NNNVVV vs. NNNVV anymore but rather N,N,NV,V,V vs.<br />

N,N,NV,V. The comma is a category with only one token which attaches to nouns or<br />

75


Chapter 4 Two SRN Prediction Studies<br />

verbs and is not <strong>in</strong>volved <strong>in</strong> long-distant dependencies. Hence, the activation pattern<br />

represent<strong>in</strong>g it should not be too complex. In fact the learn<strong>in</strong>g <strong>of</strong> comma usage <strong>in</strong> ORCs<br />

can be scaled down to a count<strong>in</strong>g recursion problem <strong>of</strong> the pattern aabb <strong>in</strong>stead <strong>of</strong> abba.<br />

As discussed <strong>in</strong> chapter 3 count<strong>in</strong>g recursion is the easiest <strong>of</strong> the three recursion types<br />

for both humans and connectionist networks (Christiansen and Chater, 1999). Thus,<br />

it is very likely that the <strong>in</strong>clusion <strong>of</strong> commas facilitates process<strong>in</strong>g <strong>in</strong> the grammatical<br />

condition lower<strong>in</strong>g the respective GPE values.<br />

(25) English with commas:<br />

a. SRC: S1 , V2 O2 , V3 O3 , V1 O1<br />

b. ORC: S1 , S2 , S3 V3 , V2 , V1 O1<br />

(26) Example test sentences:<br />

a. the banker , that the banker , that the senators phone , understands , attacks<br />

the reporters . (no-drop)<br />

b. the lawyer , that the senator , that the judges attack , praises the judge .<br />

(drop-V2)<br />

Results for 3b<br />

See figure 4.7 for the results <strong>of</strong> simulation 3b after one (left panel) and three epochs (right<br />

panel). Compared to simulation 3a there was a global improvement for both conditions.<br />

The most dramatic improvement happened on V3, which is predicted almost without<br />

errors after three epochs. Look<strong>in</strong>g at the first epoch there was more improvement due<br />

to comma <strong>in</strong>sertion on V1 for the grammatical condition. In result the V1 error was the<br />

same <strong>in</strong> both conditions. However, after subsequent tra<strong>in</strong><strong>in</strong>g the no-drop condition did<br />

not change on V1 whereas the drop-V2 condition improved further result<strong>in</strong>g <strong>in</strong> a drop-V2<br />

preference on V1. The opposite happened on post-V1 where tra<strong>in</strong><strong>in</strong>g had affected the<br />

no-drop condition more. Here tra<strong>in</strong><strong>in</strong>g did not affect the ungrammatical condition at all.<br />

In summary, there was a comma <strong>in</strong>sertion × condition × tra<strong>in</strong><strong>in</strong>g <strong>in</strong>teraction, result<strong>in</strong>g<br />

<strong>in</strong> a drop-V2 preference after completed tra<strong>in</strong><strong>in</strong>g. The stable error on post-V1 <strong>in</strong> the<br />

drop-V2 condition can be <strong>in</strong>terpreted as a floor effect. The prediction <strong>of</strong> the determ<strong>in</strong>er<br />

and the noun is very good already with a GPE value around 0.1. It is very unlikely<br />

that the SRN learns the perfectly correct probabilities result<strong>in</strong>g <strong>in</strong> a GPE value <strong>of</strong> zero<br />

even after many epochs. Therefore, on the post-V1 region improvement by tra<strong>in</strong><strong>in</strong>g<br />

is only possible for the slightly worse grammatical condition, which is why the two<br />

conditions settle on the same error value after three epochs. In conclusion, the <strong>in</strong>sertion<br />

<strong>of</strong> commas def<strong>in</strong>itely helps to make better predictions. However, tra<strong>in</strong><strong>in</strong>g effects seem to<br />

be driven by rather local consistency, affect<strong>in</strong>g the ungrammatical condition more than<br />

the grammatical. Thus, look<strong>in</strong>g at V1 after three epochs the drop-V2 preference seems<br />

to be stable for English center-embedd<strong>in</strong>g.<br />

76


4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English with commas (epoch 2)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

English with commas (epoch 3)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

Figure 4.7: Simulation 3b: English doubly-embedded ORC with added commas. The<br />

graphic shows the GPE value on the three verbs and the subsequent region <strong>of</strong> the<br />

grammatical (no-drop) and ungrammatical (drop-V2) condition. The left panel shows<br />

GPE after two epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g, the right panel shows GPE after 3 epochs.<br />

4.4.3 Simulation 4: German<br />

Model Parameters<br />

Simulation 4a tested German center-embedd<strong>in</strong>g with commas. The already tra<strong>in</strong>ed networks<br />

from section 4.2 were used. For simulation 4b tra<strong>in</strong><strong>in</strong>g corpora created from a<br />

German grammar without commas were used. The test<strong>in</strong>g corpora were built analogically<br />

to simulation 3.<br />

4a: German with Commas<br />

The German grammar exhibits a regularity for verb-f<strong>in</strong>ality <strong>in</strong> RCs. This is different<br />

from the English grammar and should enable the SRN to dist<strong>in</strong>guish 2VP and 3VP<br />

embedd<strong>in</strong>g better than <strong>in</strong> English. As seen <strong>in</strong> the English simulation commas have<br />

a facilitat<strong>in</strong>g effect although, however, the drop-VP preference returned after further<br />

tra<strong>in</strong><strong>in</strong>g. In German, commas could have an even greater facilitat<strong>in</strong>g effect. The reason<br />

for that is that the count<strong>in</strong>g-recursion pattern aabb is not only applicable <strong>in</strong> the ORC<br />

as <strong>in</strong> English but also <strong>in</strong> the SRC because both are center-embedd<strong>in</strong>g <strong>in</strong> German. As<br />

example (27) illustrates both SRC and ORC conta<strong>in</strong> the exact same pattern <strong>of</strong> nouns,<br />

verbs, and commas. In conclusion the SRN tra<strong>in</strong>ed on the German corpus should be<br />

very skilled on center-embed<strong>in</strong>g recursion and the comma count<strong>in</strong>g-recursion and hence<br />

will have much lower error rates for the grammatical condition.<br />

77


(27) German with commas:<br />

a. SRC: S1 , O2 , O3 V3 , V2 , V1 O1<br />

b. ORC: S1 , S2 , S3 V3 , V2 , V1 O1<br />

Chapter 4 Two SRN Prediction Studies<br />

(28) Example test sentences:<br />

a. der Polizist , den der Mensch , den der Polizist verspottet , ruft , verspottet<br />

den Jungen . (no-drop)<br />

b. der Polizist , den der Junge , den der Polizist verspottet , ruft den Menschen<br />

. (drop-V2)<br />

Results <strong>of</strong> 4a<br />

The results <strong>of</strong> simulation 4a (German with commas) are shown <strong>in</strong> figure 4.8. There<br />

was a dramatic improvement compared to English on V2 and V1. Interest<strong>in</strong>gly, the<br />

comparison by conditions did not reveal any difference on the ma<strong>in</strong> verb. However,<br />

a slight grammaticality preference was found on the post-V1 region. This drop-V2<br />

preference was significant (α < 0.001).<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8<br />

German with commas (epoch 2)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8<br />

German with commas (epoch 3)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

Figure 4.8: Simulation 4a: German doubly-embedded ORC. The graphic shows the<br />

GPE value on the three verbs and the subsequent region <strong>of</strong> the grammatical (nodrop)<br />

and ungrammatical (drop-V2) condition after two and three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g.<br />

4b: German without commas<br />

S<strong>in</strong>ce simulation 3b provided evidence for a comma effect, the removal <strong>of</strong> commas should<br />

make the SRN’s predictions more error-prone. The verb-f<strong>in</strong>ality regularity <strong>in</strong> German,<br />

78


4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

German without commas (epoch 2)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

GPE<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

German without commas (epoch 3)<br />

V3 V2 V1 post-V1<br />

Region<br />

drop-V2<br />

no-drop<br />

Figure 4.9: Simulation 4b: German doubly-embedded ORC without commas. The<br />

graphic shows the GPE value on the three verbs and the subsequent region <strong>of</strong> the<br />

grammatical (nod-rop) and ungrammatical (drop-V2) condition after two and 3 epochs<br />

<strong>of</strong> tra<strong>in</strong><strong>in</strong>g.<br />

however, should still cause better predictions for the grammatical condition <strong>in</strong> German<br />

than <strong>in</strong> English. Simulation 4b tested SRNs tra<strong>in</strong>ed on a comma-free German grammar.<br />

(29) Example test sentences:<br />

a. der Polizist den der Mensch den die Passanten treffen ruft verspottet den<br />

Jungen . (no-drop)<br />

b. der Passant den der Junge den der Polizist ruft beschimpft die Passanten .<br />

(drop-V2)<br />

Results <strong>of</strong> 4b<br />

The GPE values <strong>of</strong> the simulation <strong>in</strong>volv<strong>in</strong>g German without commas (figure 4.9) show<br />

a similar pattern as <strong>in</strong> English without commas. In the first epoch a drop-V2 preference<br />

was found <strong>in</strong> a small effect on V1 and a very pronounced effect on the region after. After<br />

completed tra<strong>in</strong><strong>in</strong>g, V1 and post-V1 show a similar sized drop-V2 advantage. Look<strong>in</strong>g<br />

at the data, the comma aspect seems to be the most relevant one. Surpris<strong>in</strong>gly the<br />

regularity <strong>of</strong> verb-f<strong>in</strong>al structures does not seem to support correct predictions <strong>in</strong> German<br />

more than <strong>in</strong> English. Rather the more regular application <strong>of</strong> commas <strong>in</strong> German has<br />

an extensively facilitat<strong>in</strong>g effect on both conditions, slightly more on the grammatical.<br />

The impact <strong>of</strong> comma usage and a comparison <strong>of</strong> the results to human data will be the<br />

topic <strong>of</strong> the next section.<br />

79


4.4.4 Discussion<br />

Chapter 4 Two SRN Prediction Studies<br />

The results <strong>of</strong> simulation 3a (English without commas) and 4a (German with commas)<br />

were consistent with empirical studies (Gibson and Thomas, 1999; Christiansen and<br />

MacDonald, 1999; Vasishth et al., 2008), suggest<strong>in</strong>g a difference <strong>in</strong> forgett<strong>in</strong>g behavior<br />

between German and English. A reliable grammaticality preference <strong>in</strong> German as<br />

observed <strong>in</strong> Vasishth et al. (2008) could, however, not be replicated. Simulation 3a is perfectly<br />

consistent with the simulation <strong>of</strong> the forgett<strong>in</strong>g effect <strong>in</strong> Christiansen and Chater<br />

(1999) and the human data from Vasishth et al. The results prove that not only the<br />

limited grammar used <strong>in</strong> Christiansen and Chater (1999) predicts the forgett<strong>in</strong>g effect<br />

but also the more complex grammar used here. The <strong>in</strong>herent architectural constra<strong>in</strong>ts<br />

<strong>of</strong> SRNs predict a forgett<strong>in</strong>g effect <strong>in</strong> English doubly embedded ORCs. The <strong>in</strong>sertion <strong>of</strong><br />

commas <strong>in</strong> simulation 3b had an effect on the predictions, whereas <strong>in</strong> contrast, the study<br />

by Vasishth and colleagues showed no effect. The miss<strong>in</strong>g effect <strong>of</strong> commas <strong>in</strong> their study<br />

could be expla<strong>in</strong>ed by the fact that English readers are not familiar with us<strong>in</strong>g commas<br />

<strong>in</strong> that way. On the other hand, the SRNs were tra<strong>in</strong>ed on corpora conta<strong>in</strong><strong>in</strong>g commas<br />

and, thus, had learned how to use them as structural cues. The rema<strong>in</strong><strong>in</strong>g drop-V2<br />

preference on V1 still shows a certa<strong>in</strong> consistency between the simulation and empirical<br />

data. More importantly, the model makes different predictions <strong>in</strong> German. Simulation<br />

4a shows a similar performance on grammatical and ungrammatical sentences. In comparison,<br />

the Vasishth et al. study found <strong>in</strong> fact faster read<strong>in</strong>g times <strong>in</strong> the grammatical<br />

condition for German readers. This is not predicted by the model, but it is the difference<br />

to English that is important here. Surpris<strong>in</strong>gly, the SRNs tra<strong>in</strong>ed on German without<br />

commas performed no better than the SRNs tra<strong>in</strong>ed on English without commas. This<br />

yields a comma × language <strong>in</strong>teraction. The greater effect <strong>of</strong> commas <strong>in</strong> German is<br />

expla<strong>in</strong>able by the different comma patterns <strong>in</strong> both languages (see examples 25 and<br />

27). So, the different predictions regard<strong>in</strong>g the forgett<strong>in</strong>g effect seem to be caused only<br />

<strong>in</strong>directly by word order regularities. In particular, the consistent center-embedd<strong>in</strong>g (or<br />

count<strong>in</strong>g-recursion) <strong>of</strong> commas <strong>in</strong> German makes them a reliable predictor, whereas this<br />

is not the case <strong>in</strong> English. The word order itself, however, did not have the expected<br />

effect, as experiment 4b shows. Controll<strong>in</strong>g for the comma effect the head-f<strong>in</strong>iteness <strong>of</strong><br />

SRCs <strong>in</strong> the simplified German grammar does not <strong>in</strong>crease the performance on double<br />

center-embedd<strong>in</strong>g on V1 and post-V1. Maybe doubly-embedded RCs are too rare <strong>in</strong><br />

the corpus to cause an effect. Another explanation could be the particularly <strong>in</strong>creased<br />

complexity <strong>in</strong> the prediction <strong>of</strong> German embedded RCs. In a German RC the cues establish<strong>in</strong>g<br />

the agreement <strong>of</strong> the embedded verb are very subtle compared to English.<br />

In English the word order <strong>of</strong> ‘who NP VP’ versus ‘who VP NP’ decides the agreement,<br />

whereas <strong>in</strong> German there are several possible pair<strong>in</strong>gs <strong>of</strong> der, den, and die, that determ<strong>in</strong>e<br />

the verb agreement. Thus, the verb agreement prediction requires a complex<br />

representation <strong>of</strong> previous <strong>in</strong>put. Given the architectural limits <strong>of</strong> the network, highly<br />

complex representations are <strong>in</strong> a trade<strong>of</strong>f with memory span. That means that distant<br />

dependencies and verb predictions are very hard to ma<strong>in</strong>ta<strong>in</strong>. In other words the trace<br />

80


4.5 Conclusion<br />

<strong>of</strong> the VP-predict<strong>in</strong>g NP <strong>in</strong> the representational cycle <strong>of</strong> the SRN decays faster. This <strong>in</strong><br />

turn is compensated by <strong>in</strong>creased tra<strong>in</strong><strong>in</strong>g on center-embedd<strong>in</strong>g compared to the English<br />

simulation, result<strong>in</strong>g <strong>in</strong> comparable error values when tra<strong>in</strong>ed without commas. This<br />

is, <strong>of</strong> course, an ad-hoc hypothesis and needs further <strong>in</strong>vestigation, which is beyond the<br />

scope <strong>of</strong> this thesis.<br />

4.5 Conclusion<br />

This thesis <strong>in</strong>vestigated the explanatory power <strong>of</strong> a certa<strong>in</strong> implementation <strong>of</strong> the experience<br />

account. The well approved SRN model<strong>in</strong>g approach <strong>of</strong> MacDonald and Christiansen<br />

(2002) was adopted to test its predictions on two phenomena currently discussed<br />

<strong>in</strong> literature. The RC extraction type preference <strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect<br />

<strong>in</strong> complex center-embedd<strong>in</strong>g was discussed and then modeled. At first, the two problems<br />

were approached theoretically, review<strong>in</strong>g results <strong>of</strong> empirical studies and discuss<strong>in</strong>g<br />

potential predictions <strong>of</strong> available theories. Concern<strong>in</strong>g the Mandar<strong>in</strong> relative clauses,<br />

the studies showed exceptionally mixed results. However, an observed object advantage<br />

appeared always on the RC region, whereas a subject advantage was found only on the<br />

relativizer/head noun region. That fact and the experiment by Qiao and Forster (2008)<br />

are suggest<strong>in</strong>g that Ch<strong>in</strong>ese Mandar<strong>in</strong> might have to be counted as an exception to a<br />

universal subject preference. On the other hand, the results for the forgett<strong>in</strong>g effect were<br />

very clear and best expla<strong>in</strong>ed by language-specific experience.<br />

In chapter 3 the simple recurrent network was <strong>in</strong>troduced and its properties were<br />

discussed. An SRN is a very simple and doma<strong>in</strong>-unspecific model, but it accounts for<br />

the three necessities <strong>in</strong>troduced <strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> this thesis: a) biological factors<br />

(architectural limits), b) cont<strong>in</strong>uous gradedness <strong>in</strong> performance, and c) experience.<br />

In chapter 4 the experience theory predictions regard<strong>in</strong>g the two sample problems were<br />

practically assessed. Just like <strong>in</strong> the discussion and critique <strong>of</strong> MC02, the simulation<br />

results presented here looked promis<strong>in</strong>g on first sight; but sub-experiments and detailed<br />

data analysis revealed considerable <strong>in</strong>consistencies with respect to human data. The<br />

Mandar<strong>in</strong> RC simulation predicted an object preference, but the location <strong>of</strong> the effect<br />

was not consistent with human data. In addition, the second simulation demonstrated<br />

that the regularity effect was very weak. It becomes clear that the tra<strong>in</strong><strong>in</strong>g material used<br />

must be carefully chosen <strong>in</strong> order to guarantee comparability with other simulations and<br />

empirical studies. The forgett<strong>in</strong>g effect was predicted to be present <strong>in</strong> English but not <strong>in</strong><br />

German, consistent with human data. However, further simulations revealed the comma<br />

<strong>in</strong>sertion as the most important factor.<br />

Of course, it has to be clear that the simple network tra<strong>in</strong>ed on a simple grammar<br />

would not learn the same constra<strong>in</strong>ts as humans do. These simulations are rather approximations<br />

po<strong>in</strong>t<strong>in</strong>g to a certa<strong>in</strong> direction. A noticeable problem <strong>of</strong> the SRN predictions<br />

is their dependency on local coherence, which can also be described as a low memory<br />

span. This is, however, ma<strong>in</strong>ly dependent upon the specific properties <strong>of</strong> the learn<strong>in</strong>g<br />

81


Chapter 4 Two SRN Prediction Studies<br />

mechanism and the context loop. As mentioned <strong>in</strong> the previous chapter, there are other<br />

learn<strong>in</strong>g mechanisms that can <strong>in</strong>crease the span; although they might be cognitively very<br />

unmotivated. Interest<strong>in</strong>gly, however, there is evidence that even human readers rely on<br />

local coherence <strong>in</strong> certa<strong>in</strong> structures (Tabor et al., 2004). Another f<strong>in</strong>d<strong>in</strong>g is that the<br />

simulations reported <strong>in</strong> Christiansen and Chater (1999) and also the comma issue <strong>in</strong> simulations<br />

3 and 4 presented here showed that the SRN handles count<strong>in</strong>g-recursion better<br />

than other types. That may be the reason for the strong facilitat<strong>in</strong>g effect <strong>of</strong> comma<br />

<strong>in</strong>sertion compared to head-f<strong>in</strong>ality. Address<strong>in</strong>g this, it shall be noted that Rodriguez<br />

(2001) claims that SRNs can <strong>in</strong> fact carry out explicit symbolic count<strong>in</strong>g procedures.<br />

This work argued for a uniform account to <strong>in</strong>dividual and language-specific differences<br />

as well as language-<strong>in</strong>dependent process<strong>in</strong>g skill. All three can <strong>in</strong> considerable parts be<br />

attributed to experience with the <strong>in</strong>dividual l<strong>in</strong>guistic environment <strong>in</strong> <strong>in</strong>teraction with<br />

architectural preconditions. It can be concluded that a lot <strong>of</strong> work is necessary before<br />

f<strong>in</strong>e-gra<strong>in</strong>ed experience-<strong>based</strong> predictions can be ga<strong>in</strong>ed for the highly complex task <strong>of</strong><br />

sentence comprehension. By all means, literature shows a promis<strong>in</strong>g trend towards PDP<br />

models <strong>of</strong> language comprehension, accompanied by the <strong>in</strong>tegration <strong>of</strong> corpus analyses<br />

and acquisition data.<br />

82


Bibliography<br />

J. R. Anderson and C. Lebiere. The Atomic Components <strong>of</strong> Thought. Lawrence Erlbaum<br />

Associates, 1998.<br />

J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Q<strong>in</strong>. An<br />

<strong>in</strong>tegrated theory <strong>of</strong> the m<strong>in</strong>d. Psychological Review, New York, 111(4):1036–1060,<br />

2004.<br />

E. Bach, C. Brown, and W. Marslen-wilson. Crossed and nested dependencies <strong>in</strong> german<br />

and dutch: A psychol<strong>in</strong>guistic study. Language and Cognitive Processes, 1(4):249–262,<br />

1986.<br />

A. D. Baddeley. Human Memory: Theory and Practice. Psychology Press, 1997.<br />

R. C. Berwick and A. S. We<strong>in</strong>berg. The grammatical basis <strong>of</strong> l<strong>in</strong>guistic performance.<br />

MIT Press, 1984.<br />

T. G. Bever. The cognitive basis for l<strong>in</strong>guistic structures. Cognition and the development<br />

<strong>of</strong> language, 279, 1970.<br />

M. S. Blaubergs and M. D. S. Bra<strong>in</strong>e. Short-term memory limitations on decod<strong>in</strong>g<br />

self-embedded sentences. Journal <strong>of</strong> Experimental Psychology, 102(4):745–748, 1974.<br />

A. L. Blumenthal. Observations with self-embedded sentences. Psychonomic Science, 6<br />

(10):453–454, 1966.<br />

J. K. Bock. An effect <strong>of</strong> the accessibility <strong>of</strong> word forms on sentence structures. Journal<br />

<strong>of</strong> Memory and Language, 26(2):119–137, 1987.<br />

D. Caplan, S. Vijayan, G. Kuperberg, C. West, G. Waters, D. Greve, and A. M.<br />

Dale. Vascular responses to syntactic process<strong>in</strong>g: Event-related fMRI study <strong>of</strong> relative<br />

clauses. Human Bra<strong>in</strong> Mapp<strong>in</strong>g, 15(1):26–38, 2002.<br />

N. Chomsky. Lectures on government and b<strong>in</strong>d<strong>in</strong>g the Pisa lectures Studies <strong>in</strong> generative<br />

grammar. Foris publications, 1981.<br />

N. Chomsky. Aspects <strong>of</strong> the Theory <strong>of</strong> Syntax. MIT Press, Cambridge, 1965.<br />

N. Chomsky. Syntactic Structures. Mouton, The Hague, 1957.<br />

VIII


Bibliography<br />

M. H. Christiansen. The (non) necessity <strong>of</strong> recursion <strong>in</strong> natural language. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the Fourteenth Annual Conference <strong>of</strong> the Cognitive Science Society. Lawrence<br />

Erlbaum Associates, 1992.<br />

M. H. Christiansen and N. Chater. Toward a connectionist model <strong>of</strong> recursion <strong>in</strong> human<br />

l<strong>in</strong>guistic performance. Cognitive Science, 23(2):157–205, 1999.<br />

M. H. Christiansen and M. C. MacDonald. Process<strong>in</strong>g <strong>of</strong> recursive sentence structure:<br />

Test<strong>in</strong>g predictions from a connectionist model. Manuscript <strong>in</strong> preparation, 1999.<br />

L. Cohen and J. Mehler. Click monitor<strong>in</strong>g revisited: An on-l<strong>in</strong>e study <strong>of</strong> sentence<br />

comprehension. Memory & cognition, 24(1):94–102, 1996.<br />

M. Corley and S. Corley. Cross-l<strong>in</strong>guistic and <strong>in</strong>tra-l<strong>in</strong>guistic evidence for the use <strong>of</strong><br />

statistics <strong>in</strong> human sentence process<strong>in</strong>g. Unpublished manuscript, University <strong>of</strong> Exeter,<br />

1995.<br />

F. Cuetos, D. Mitchell, and M. Corley. Pars<strong>in</strong>g <strong>in</strong> different languages. Language process<strong>in</strong>g<br />

<strong>in</strong> Spanish, pages 145–187, 1996.<br />

M. Daneman and P. A. Carpenter. Individual differences <strong>in</strong> work<strong>in</strong>g memory and read<strong>in</strong>g.<br />

Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 19(4):450–66, 1980.<br />

G. S. Dell and P. G. O’Seaghdha. Stages <strong>of</strong> lexical access <strong>in</strong> language production.<br />

Cognition, 42(1-3):287–314, 1992.<br />

G. S. Dell, L. K. Burger, and W. R. Svec. Language production and serial order: A<br />

functional analysis and a model. Cognitive <strong>Model<strong>in</strong>g</strong>, 2002.<br />

T. Desmet and E. Gibson. Disambiguation preferences and corpus frequencies <strong>in</strong> noun<br />

phrase conjunction. Journal <strong>of</strong> Memory and Language, 49(3):353–374, 2003.<br />

J. A. V. Dyke and R. L. Lewis. Dist<strong>in</strong>guish<strong>in</strong>g effects <strong>of</strong> structure and decay on attachment<br />

and repair: A cue-<strong>based</strong> pars<strong>in</strong>g account <strong>of</strong> recovery from misanalyzed ambiguities.<br />

Journal <strong>of</strong> Memory and Language, 49(3):285–316, 2003.<br />

J. A. V. Dyke and B. McElree. Retrieval <strong>in</strong>terference <strong>in</strong> sentence comprehension. Journal<br />

<strong>of</strong> Memory and Language, 55(2):157–166, 2006.<br />

S. F. Ehrlich and K. Rayner. Contextual effects on word perception and eye movements<br />

dur<strong>in</strong>g read<strong>in</strong>g. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 20:641–655, 1981.<br />

J. L. Elman. Tlearn simulator.<br />

S<strong>of</strong>tware available at: http://crl.ucsd.edu/<strong>in</strong>nate/tlearn.html, 1992.<br />

J. L. Elman. F<strong>in</strong>d<strong>in</strong>g structure <strong>in</strong> time. Cognitive Science, 14(2):179–211, 1990.<br />

IX


Bibliography<br />

P. Erdmann. Ist das deutsche e<strong>in</strong>e sov-sprache? Zeitschrift für deutsche Sprache und<br />

Literatur, 1990.<br />

F. Ferreira. The mis<strong>in</strong>terpretation <strong>of</strong> noncanonical sentences. Cognitive Psychology, 47<br />

(2):164–203, 2003.<br />

M. Ford. A method for obta<strong>in</strong><strong>in</strong>g measures <strong>of</strong> local pars<strong>in</strong>g complexity throughout<br />

sentences. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 22:203–218, 1983.<br />

K. I. Forster, C. Guerrera, and L. Elliot. The maze task: measur<strong>in</strong>g forced <strong>in</strong>cremental<br />

sentence process<strong>in</strong>g time. Manuscript submitted for publication, 2008.<br />

U. Frauenfelder and J. Segui. Monitor<strong>in</strong>g around the relative clause. Journal <strong>of</strong> Verbal<br />

Learn<strong>in</strong>g and Verbal Behavior, pages 19.328–37, 1980.<br />

L. Frazier. Syntactic process<strong>in</strong>g: Evidence from dutch. Natural language and l<strong>in</strong>guistic<br />

theory, 5(4):519–559, 1987.<br />

L. Frazier. On comprehend<strong>in</strong>g sentences: Synactic pars<strong>in</strong>g strategies. PhD thesis, University<br />

<strong>of</strong> Connecticut, West Bend, IN., 1979.<br />

L. Frazier. Syntactic complexity. In D. R. Dowty, L. Kartunnen, and A. M. Zwicky,<br />

editors, Natural language pars<strong>in</strong>g: Psychological, computational, and theoretical perspectives,<br />

pages 129–189. Cambridge University Press, 1985.<br />

L. Frazier and C. Clifton. Successive cyclicity <strong>in</strong> the grammar and the parser. Language<br />

and Cognitive Processes, 4:93–126, 1989.<br />

L. Frazier and C. Clifton. Construal. MIT Press, 1996.<br />

L. Frazier and G. B. Flores d’Arcais. Filler-driven pars<strong>in</strong>g: a study <strong>of</strong> gap fill<strong>in</strong>g <strong>in</strong><br />

dutch. Journal <strong>of</strong> Memory and Language, 28:331–344, 1989.<br />

L. Frazier, C. Clifton, and J. Randall. Fill<strong>in</strong>g gaps: Decision pr<strong>in</strong>ciples and structure <strong>in</strong><br />

sentence comprehension. Cognition, 13(2):187–222, 1983.<br />

E. Gibson. The dependency locality theory: A distance-<strong>based</strong> theory <strong>of</strong> l<strong>in</strong>guistic complexity.<br />

In Image, language, bra<strong>in</strong>: Papers from the first m<strong>in</strong>d articulation project<br />

symposium, pages 95–126, 2000.<br />

E. Gibson. L<strong>in</strong>guistic complexity: locality <strong>of</strong> syntactic dependencies. Cognition, 68(1):<br />

1–76, 1998.<br />

E. Gibson and C. T. Schütze. Disambiguation preferences <strong>in</strong> noun phrase conjunction<br />

do not mirror corpus frequency. Journal <strong>of</strong> Memory and Language, 40(2):263–279,<br />

1999.<br />

X


Bibliography<br />

E. Gibson and J. Thomas. Memory limitations and structural forgett<strong>in</strong>g: The perception<br />

<strong>of</strong> complex ungrammatical sentences as grammatical. Language and Cognitive<br />

Processes, 14(3):225–248, 1999.<br />

E. Gibson, T. Desmet, D. Grodner, D. Watson, and K. Ko. Read<strong>in</strong>g relative clauses <strong>in</strong><br />

english. Cognitive L<strong>in</strong>guistics, 16(2):313–353, 2005a.<br />

E. Gibson, K. Nakatani, and E. Chen. Dist<strong>in</strong>guish<strong>in</strong>g theories <strong>of</strong> syntactic storage cost<br />

<strong>in</strong> sentence comprehension: Evidence from japanese. To appear, 2005b.<br />

P. C. Gordon, R. Hendrick, and M. Johnson. Memory <strong>in</strong>terference dur<strong>in</strong>g language<br />

process<strong>in</strong>g. Journal <strong>of</strong> Experimenatl Psychology, Learn<strong>in</strong>g, Memory, and Cognition,<br />

27(6):1411–1423, 2001.<br />

P. C. Gordon, R. Hendrick, and W. H. Lev<strong>in</strong>e. Memory-Load <strong>in</strong>terference <strong>in</strong> syntactic<br />

process<strong>in</strong>g. Psychological Science, 13(5):425–430, 2002.<br />

P. C. Gordon, R. Hendrick, and M. Johnson. <strong>Effects</strong> <strong>of</strong> noun phrase type on sentence<br />

complexity. Journal <strong>of</strong> Memory and Language, 51(1):97–114, 2004.<br />

P. C. Gordon, R. Hendrick, M. Johnson, and Y. Lee. Similarity-Based <strong>in</strong>terference<br />

dur<strong>in</strong>g language comprehension: Evidence from eye track<strong>in</strong>g dur<strong>in</strong>g read<strong>in</strong>g. Journal<br />

<strong>of</strong> experimental psychology. Learn<strong>in</strong>g, memory, and cognition, 32(6):1304, 2006.<br />

A. Gouvea. Process<strong>in</strong>g Syntactic Complexity: Cross-l<strong>in</strong>guistic Differences and ERP<br />

Evidence. PhD thesis, University <strong>of</strong> Maryland, College Park, 2003.<br />

J. H. Greenberg. Some universals <strong>of</strong> grammar with particular reference to the order<br />

<strong>of</strong> mean<strong>in</strong>gful elements. In J. H. Greenberg, editor, Universals <strong>of</strong> Language, pages<br />

73–113. MIT Press, London, 1963.<br />

D. Grodner and E. Gibson. Consequences <strong>of</strong> the serial nature <strong>of</strong> l<strong>in</strong>guistic <strong>in</strong>put for<br />

sentenial complexity. Cognitive Science, 29(2):261–290, 2005.<br />

J. Hale. A probabilistic earley parser as a psychol<strong>in</strong>guistic model. North American<br />

Chapter Of The Association For Computational L<strong>in</strong>guistics, pages 1–8, 2001.<br />

B. Hemforth, L. Konieczny, and C. Scheepers. Syntactic attachment and anaphor resolution:<br />

Two sides <strong>of</strong> relative clause attachment. Architectures and mechanisms for<br />

language process<strong>in</strong>g, pages 259–281, 2000.<br />

R. N. A. Henson. Short-Term memory for serial order: The Start-End model. Cognitive<br />

Psychology, 36(2):73–137, 1998.<br />

V. M. Holmes and J. K. O’Regan. Eye fixation patterns dur<strong>in</strong>g the read<strong>in</strong>g <strong>of</strong> relative<br />

clause sentences. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 20(4):1, 1981.<br />

XI


Bibliography<br />

F. Hsiao and E. Gibson. Process<strong>in</strong>g relative clauses <strong>in</strong> ch<strong>in</strong>ese. Cognition, 90(1):3–27,<br />

2003.<br />

C.-C. Hsu, F. Hurewitz, and C. Phillips. Contextual and syntactic cues for head-f<strong>in</strong>al<br />

relative clauses <strong>in</strong> ch<strong>in</strong>ese. In The 19th Annual CUNY Conference on Human <strong>Sentence</strong><br />

Process<strong>in</strong>g, NY, 2006.<br />

T. Ishizuka, K. Nakatani, and E. Gibson. Process<strong>in</strong>g japanese relative clauses <strong>in</strong> context.<br />

In The 19th Annual CUNY Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g, CUNY<br />

Graduate Canter, NY, 2006.<br />

F. T. Jaeger, E. Fedorenko, P. H<strong>of</strong>meister, and E. Gibson. Expectation-<strong>based</strong> syntactic<br />

process<strong>in</strong>g: Antilocality outside <strong>of</strong> head-f<strong>in</strong>al languages. CUNY <strong>Sentence</strong> Process<strong>in</strong>g<br />

Conference, North Carol<strong>in</strong>a, 2008.<br />

C. Juliano and M. K. Tanenhaus. A constra<strong>in</strong>t-<strong>based</strong> lexicalist account <strong>of</strong> the subject/object<br />

attachment preference. Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research, 23(6):459–471,<br />

1994.<br />

D. Jurafsky. A probabilistic model <strong>of</strong> lexical and syntactic access and disambiguation.<br />

Cognitive Science, 20(2):137–194, 1996.<br />

M. A. Just and P. A. Carpenter. A capacity theory <strong>of</strong> comprehension: <strong>in</strong>dividual differences<br />

<strong>in</strong> work<strong>in</strong>g memory. Psychological Review, 99(1):122–49, 1992.<br />

M. A. Just and P. A. Carpenter. A capacity theory <strong>of</strong> comprehension: Individual differences<br />

<strong>in</strong> work<strong>in</strong>g memory. Cognitive <strong>Model<strong>in</strong>g</strong>, pages 131–177, 2002.<br />

M. A. Just and S. Varma. A hybrid architecture for work<strong>in</strong>g memory: Reply to macdonald<br />

and christiansen (2002). Psychological Review, 109(1):55–65, 2002.<br />

M. A. Just, P. A. Carpenter, and J. D. Woolley. Paradigms and processes <strong>in</strong> read<strong>in</strong>g<br />

comprehension. Journal <strong>of</strong> Experimental Psychology: General, 111(2):228–238, 1982.<br />

M. A. Just, P. A. Carpenter, T. A. Keller, W. F. Eddy, and K. R. Thulborn. Bra<strong>in</strong><br />

activation modulated by sentence comprehension. Science, 274(5284):114, 1996.<br />

A. Kawamoto. Nonl<strong>in</strong>ear dynamics <strong>in</strong> the resolution <strong>of</strong> lexical ambiguity: a parallel<br />

distributed process<strong>in</strong>g account. Journal <strong>of</strong> memory and language(Pr<strong>in</strong>t), 32(4):474–<br />

516, 1993.<br />

E. L. Keenan and B. Comrie. Noun phrase accessibility and universal grammar. L<strong>in</strong>guistic<br />

Inquiry, pages 63–99, 1977.<br />

J. K<strong>in</strong>g and M. A. Just. Individual differences <strong>in</strong> syntactic process<strong>in</strong>g: The role <strong>of</strong><br />

work<strong>in</strong>g memory. Journal <strong>of</strong> Memory and Language, 30(5):580–602, 1991.<br />

XII


Bibliography<br />

J. W. K<strong>in</strong>g and M. Kutas. Who did what and when? us<strong>in</strong>g word-and Clause-Level ERPs<br />

to monitor work<strong>in</strong>g memory usage <strong>in</strong> read<strong>in</strong>g. Journal <strong>of</strong> Cognitive Neuroscience, 7<br />

(3):376–395, 1995.<br />

L. Konieczny. Locality and pars<strong>in</strong>g complexity. Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research,<br />

29(6):627–645, 2000.<br />

L. Konieczny and N. Ruh. What’s <strong>in</strong> an error? a reply to macdonald and christiansen<br />

(2002). Manuscript submitted, University <strong>of</strong> Freiburg, 2003.<br />

K. Kuo and S. Vasishth. Process<strong>in</strong>g ch<strong>in</strong>ese relative clauses: Evidence for the universal<br />

subject preference. Manuscript submitted, 2007.<br />

N. Kwon, M. Pol<strong>in</strong>sky, and R. Kluender. Process<strong>in</strong>g <strong>of</strong> relative clause sentences <strong>in</strong><br />

korean. Poster presented at AMLaP Conference, 2004.<br />

W. Lark<strong>in</strong> and D. Burns. <strong>Sentence</strong> comprehension and memory for embedded structure.<br />

Memory and Cognition, 5(1):17–22, 1977.<br />

R. Levy. Expectation-<strong>based</strong> syntactic comprehension. Cognition, 106(3):1126–1177,<br />

2008.<br />

R. Lewis. A theory <strong>of</strong> grammatical but unacceptable embedd<strong>in</strong>gs. Journal <strong>of</strong> Psychol<strong>in</strong>guistic<br />

Research, 25:93–116, 1996.<br />

R. L. Lewis and S. Vasishth. An activation-<strong>based</strong> model <strong>of</strong> sentence process<strong>in</strong>g as skilled<br />

memory retrieval. Cognitive Science, 29:1–45, May 2005.<br />

R. L. Lewis, S. Vasishth, and J. Van Dyke. Computational pr<strong>in</strong>ciples <strong>of</strong> work<strong>in</strong>g memory<br />

<strong>in</strong> sentence comprehension. Trends <strong>in</strong> Cognitive Sciences, 10(10):447–454, October<br />

2006.<br />

C. C. L<strong>in</strong>. The psychological reality <strong>of</strong> head-f<strong>in</strong>al relative clauses. Paper presented at<br />

the International Workshop on Relative Clauses, Academia S<strong>in</strong>ica, Taipei, 2007.<br />

C. C. L<strong>in</strong> and T. G. Bever. Ch<strong>in</strong>ese is no exception: Universal subject preference <strong>of</strong><br />

relative clause process<strong>in</strong>g. Paper presented at The 19th Annual CUNY Conference on<br />

Human <strong>Sentence</strong> Process<strong>in</strong>g, CUNY Graduate Center, New York, NY, 2006a.<br />

C. C. L<strong>in</strong> and T. G. Bever. Subject preference <strong>in</strong> the process<strong>in</strong>g <strong>of</strong> relative clauses <strong>in</strong><br />

ch<strong>in</strong>ese. In D. D. B. Donald Baumer, D. Montero, and M. Scanlon, editors, Proceed<strong>in</strong>gs<br />

<strong>of</strong> the 25th West Coast Conference on Formal L<strong>in</strong>guistics. Somerville, MA: Cascadilla<br />

Proceed<strong>in</strong>gs Project, pages 254–260, 2006b.<br />

C.-J. C. L<strong>in</strong> and T. G. Bever. Process<strong>in</strong>g head-f<strong>in</strong>al relative clauses without garden<br />

paths. Paper presented at the International Conference on Process<strong>in</strong>g Head-F<strong>in</strong>al<br />

Structures, Rochester Institute <strong>of</strong> Technology, Rochester, NY, September 21-22, 2007.<br />

XIII


Bibliography<br />

C. J. C. L<strong>in</strong>, S. Fong, and T. G. Bever. Construct<strong>in</strong>g filler-gap dependencies <strong>in</strong> ch<strong>in</strong>ese<br />

possessor relative clauses. In Proceed<strong>in</strong>gs <strong>of</strong> PACLIC, 2005.<br />

Y. L<strong>in</strong> and S. Garnsey. Plausibility and the resolution <strong>of</strong> temporary ambiguity <strong>in</strong> relative<br />

clause comprehension <strong>in</strong> mandar<strong>in</strong>. In Proceed<strong>in</strong>gs <strong>of</strong> the 20th Annual CUNY<br />

Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g, 2007.<br />

M. C. MacDonald and M. H. Christiansen. Reassess<strong>in</strong>g work<strong>in</strong>g memory: Comment on<br />

Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109<br />

(1):35–54, 2002.<br />

M. C. MacDonald, N. J. Pearlmutter, and M. S. Seidenberg. Lexical nature <strong>of</strong> syntactic<br />

ambiguity resolution. Psychological Review, 101(4):676–703, October 1994.<br />

B. MacWh<strong>in</strong>ney. Start<strong>in</strong>g po<strong>in</strong>ts. Language, 53(1):152–168, 1977.<br />

B. MacWh<strong>in</strong>ney. Basic syntactic processes. Language development, 1:73–136, 1982.<br />

B. MacWh<strong>in</strong>ney and C. Pleh. The process<strong>in</strong>g <strong>of</strong> restrictive relative clauses <strong>in</strong> hungarian.<br />

Cognition, 29(2):95–141, 1988.<br />

W. M. Mak, W. Vonk, and H. Schriefers. The <strong>in</strong>fluence <strong>of</strong> animacy on relative clause<br />

process<strong>in</strong>g. Journal <strong>of</strong> Memory and Language, 47(1):50–68, 2002.<br />

Y. Matsumoto. Noun-modify<strong>in</strong>g Constructions <strong>in</strong> Japanese: A Frame-semantic Approach.<br />

John Benjam<strong>in</strong>s, 1997.<br />

J. L. McClelland and J. L. Elman. The TRACE Model <strong>of</strong> Speech Perception. California<br />

University San Diego, La Jolla Center for Research <strong>in</strong> Language, 1984.<br />

A. Meckl<strong>in</strong>ger, H. Schriefers, K. Ste<strong>in</strong>hauer, and A. D. Friederici. Process<strong>in</strong>g relative<br />

clauses vary<strong>in</strong>g on syntactic and semantic dimensions: an analysis with event-related<br />

potentials. Memory & Cognition, 23(4):477–94, 1995.<br />

D. C. Mitchell, F. Cuetos, M. M. B. Corley, and M. Brysbaert. Exposure-<strong>based</strong> models <strong>of</strong><br />

human pars<strong>in</strong>g: Evidence for the use <strong>of</strong> coarse-gra<strong>in</strong>ed (nonlexical) statistical records.<br />

Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research, 24(6):469–488, 1995.<br />

E. Miyamoto and M. Nakamura. Subject/object asymmetries <strong>in</strong> the process<strong>in</strong>g <strong>of</strong> relative<br />

clauses <strong>in</strong> Japanese. In Proceed<strong>in</strong>gs <strong>of</strong> WCCFL, volume 22, pages 342–355, 2003.<br />

X. Qiao and K. I. Forster. Object relatives ARE easier than subject relatives <strong>in</strong> ch<strong>in</strong>ese.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> AMLaP Conference, 2008.<br />

D. S. Race and M. C. MacDonald. The use <strong>of</strong> “that” <strong>in</strong> the production and comprehension<br />

<strong>of</strong> object relative clauses. In Proceed<strong>in</strong>gs <strong>of</strong> the 25th Annual Meet<strong>in</strong>g <strong>of</strong> the Cognitive<br />

Science Society, pages 946–951, 2003.<br />

XIV


Bibliography<br />

F. Reali and M. H. Christiansen. Process<strong>in</strong>g <strong>of</strong> relative clauses is made easier by frequency<br />

<strong>of</strong> occurrence. Journal <strong>of</strong> Memory and Language, 57(1):1–23, 2007a.<br />

F. Reali and M. H. Christiansen. Word chunk frequencies affect the process<strong>in</strong>g <strong>of</strong> pronom<strong>in</strong>al<br />

object-relative clauses. The Quarterly Journal <strong>of</strong> Experimental Psychology, 60(2):<br />

161–170, 2007b.<br />

R. Roberts and E. Gibson. Individual differences <strong>in</strong> sentence memory. Journal <strong>of</strong> Psychol<strong>in</strong>guistic<br />

Research, 31(6):573–598, Nov. 2002.<br />

P. Rodriguez. Simple Recurrent Networks Learn Context-Free and Context-Sensitive<br />

Languages by Count<strong>in</strong>g, volume 13. MIT Press, 2001.<br />

D. L. T. Rohde. A <strong>Connectionist</strong> Model <strong>of</strong> <strong>Sentence</strong> Comprehension and Production.<br />

PhD thesis, Carnegie Mellon University, 2002.<br />

D. L. T. Rohde. The simple language generator: Encod<strong>in</strong>g complex languages with<br />

simple grammars (Tech). Mellon University, Department <strong>of</strong> Computer Science, pages<br />

99–123, 1999.<br />

D. E. Rumelhart and J. L. McClelland. On Learn<strong>in</strong>g the Past Tenses <strong>of</strong> English Verbs.<br />

California University San Diego, La Jolla Center for Research <strong>in</strong> Language, 1985.<br />

H. Schriefers, A. D. Friederici, and K. Kuhn. The process<strong>in</strong>g <strong>of</strong> locally ambiguous relative<br />

clauses <strong>in</strong> german. Journal <strong>of</strong> Memory and Language, 34(4):499–520, 1995.<br />

M. S. Seidenberg and J. L. McClelland. A distributed, developmental model <strong>of</strong> word<br />

recognition and nam<strong>in</strong>g. Psychological Review, 96(4):523–568, 1989.<br />

M. Spivey-Knowlton. Quantitative predictions from a constra<strong>in</strong>t-<strong>based</strong> theory <strong>of</strong> syntactic<br />

ambiguity resolution. In Proceed<strong>in</strong>gs <strong>of</strong> the 1993 <strong>Connectionist</strong> Models Summer<br />

School, pages 130–137. Lawrence Erlbaum Associates, 1994.<br />

K. Stromswold, D. Caplan, N. Alpert, and S. Rauch. Localization <strong>of</strong> syntactic comprehension<br />

by positron emission tomography. Bra<strong>in</strong> and Language, 52(3):452–473,<br />

1996.<br />

W. Tabor, C. Juliano, and M. K. Tanenhaus. Pars<strong>in</strong>g <strong>in</strong> a dynamical system: An<br />

attractor-<strong>based</strong> account <strong>of</strong> the <strong>in</strong>teraction <strong>of</strong> lexical and structural constra<strong>in</strong>ts <strong>in</strong> sentence<br />

process<strong>in</strong>g. Language and Cognitive Processes, 12(2/3):211–271, 1997.<br />

W. Tabor, B. Galantucci, and D. Richardson. <strong>Effects</strong> <strong>of</strong> merely local syntactic coherence<br />

on sentence process<strong>in</strong>g. Journal <strong>of</strong> Memory and Language, 50(4):355–370, May 2004.<br />

W. L. Taylor. Cloze procedure: A new tool for measur<strong>in</strong>g readability. Journalism<br />

Quarterly, 30(4):415–433, 1953.<br />

XV


Bibliography<br />

R. S. Toml<strong>in</strong>. Basic word order: Functional pr<strong>in</strong>ciples. Croom Helm, London, 1986.<br />

M. J. Traxler, R. K. Morris, and R. E. Seely. Process<strong>in</strong>g subject and object relative<br />

clauses: Evidence from eye movements. Journal <strong>of</strong> Memory and Language, 47(1):<br />

69–90, July 2002.<br />

S. Vasishth. Integration and prediction <strong>in</strong> head-f<strong>in</strong>al structures. In Process<strong>in</strong>g and<br />

Produc<strong>in</strong>g Head-F<strong>in</strong>al Structure. 2008.<br />

S. Vasishth and R. L. Lewis. Human language process<strong>in</strong>g: Symbolic models. In K. Brown,<br />

editor, Encyclopedia <strong>of</strong> Language and L<strong>in</strong>guistics, volume 5, pages 410–419. Elsevier,<br />

2006a.<br />

S. Vasishth and R. L. Lewis. Argument-head distance and process<strong>in</strong>g complexity: Expla<strong>in</strong><strong>in</strong>g<br />

both locality and antilocality effects. Language, 82(4):767–794, 2006b.<br />

S. Vasishth, K. Suckow, R. Lewis, and S. Kern. Short-term forgett<strong>in</strong>g <strong>in</strong> sentence comprehension:<br />

Crossl<strong>in</strong>guistic evidence from head-f<strong>in</strong>al structures. Submitted to Language<br />

and Cognitive Processes, 2008.<br />

G. S. Waters and D. Caplan. The capacity theory <strong>of</strong> sentence comprehension: critique<br />

<strong>of</strong> just and carpenter (1992). Psychological Review, 103(4):761–72, 1996.<br />

J. B. Wells, M. H. Christiansen, D. S. Race, D. J. Acheson, and M. C. MacDonald. <strong>Experience</strong><br />

and sentence process<strong>in</strong>g: Statistical learn<strong>in</strong>g and relative clause comprehension.<br />

Cognitive Psychology, 58(2):250–271, 2009.<br />

M. Yoshida, S. Aoshima, and C. Phillips. Relative clause prediction <strong>in</strong> japanese. In<br />

Proceed<strong>in</strong>gs <strong>of</strong> the 17th Annual CUNY Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g,<br />

College Park, Maryland, 2004.<br />

XVI


Appendix A<br />

Statistics<br />

SRC ORC<br />

region mean se mean se<br />

N1/V1 0.1746577 0.0568403 0.2183628 0.2644270<br />

de 0.3303811 0.1897248 0.1207579 0.0726347<br />

N2 0.1001319 0.0391884 0.1064459 0.0342768<br />

Table A.1: Statistics for simulation 1<br />

SRC ORC<br />

region mean se mean se<br />

N1/V1 0.1319658 0.06477893 0.2172967 0.2631532<br />

de 0.0870274 0.07931393 0.1096769 0.0772652<br />

N2 0.1001319 0.03918843 0.1064459 0.0342768<br />

Table A.2: Statistics for simulation 2<br />

drop-V2 no-drop<br />

region mean se mean se<br />

V3 0.7976032 0.1091270 0.7977388 0.1090123<br />

V1 0.8639397 0.0581128 0.9801276 0.0140664<br />

post-V1 0.1610184 0.1042171 0.2658784 0.1700682<br />

Table A.3: Statistics for simulation 3a<br />

XVII


drop-V2 no-drop<br />

region mean se mean se<br />

V3 0.1794197 0.0801674 0.1797425 0.08047316<br />

V1 0.6870279 0.0550648 0.7735128 0.05198529<br />

post-V1 0.1365552 0.1125236 0.1522183 0.0880624<br />

Table A.4: Statistics for simulation 3b<br />

drop-V2 no-drop<br />

region mean se mean se<br />

V3 0.1376888 0.1044064 0.1375375 0.1043905<br />

V1 0.5554193 0.2136721 0.5564368 0.2350018<br />

post-V1 0.3160993 0.122175 0.2946364 0.1392607<br />

Table A.5: Statistics for simulation 4a<br />

drop-V2 no-drop<br />

region mean se mean se<br />

V3 0.1462594 0.1230590 0.1465719 0.1232603<br />

V1 0.8691347 0.1514235 0.9761057 0.02860683<br />

post-V1 0.3004169 0.2266813 0.4357311 0.1730854<br />

Table A.6: Statistics for simulation 4b<br />

XVIII<br />

Appendix A Statistics


Appendix B<br />

Grammars<br />

B.1 English<br />

(written by Lars Konieczny, 2003)<br />

S : NP VP "." |<br />

{num1, NP N, VP Vi} | #number agreement <strong>in</strong> matrix clause<br />

{num2, NP N, VP Vt} ;<br />

NP : det N | det N Rel (0.05) |<br />

{num1, N, Rel SRC VP Vi} | #number agreement <strong>in</strong> Subject-RCs.<br />

{num2, N, Rel SRC VP Vt} ;<br />

Rel : SRC | ORC ;<br />

SRC : that VP ;<br />

ORC : that NP Vt |<br />

{num2, NP N, Vt} ; #number agreement <strong>in</strong> Object-RCs.<br />

VP : Vi | Vt NP ;<br />

N : Ns<strong>in</strong>g | Nplur ;<br />

Vi : VIs<strong>in</strong>g | VIplur ;<br />

Vt : VTs<strong>in</strong>g | VTplur ;<br />

### LEXICON ###<br />

Ns<strong>in</strong>g : lawyer | senator | reporter | banker | judge ;<br />

Nplur : lawyers | senators | reporters | bankers | judges ;<br />

VIs<strong>in</strong>g : lies | lied | hesitates | hesitated | phones |<br />

XIX


Appendix B Grammars<br />

phoned | understands | understood ;<br />

VTs<strong>in</strong>g : praises | praised | attacks | attacked |<br />

phones | phoned | understands | understood ;<br />

VIplur : lie | lied | hesitate | hesitated | phone |<br />

phoned | understand | understood ;<br />

VTplur : praise | praised | attack | attacked |<br />

phone | phoned | understand | understood ;<br />

det : the ;<br />

### RULES ###<br />

num1 { # for <strong>in</strong>transitive verbs<br />

Ns<strong>in</strong>g : VIs<strong>in</strong>g;<br />

Nplur : VIplur;<br />

}<br />

num2 { # for transitive verbs<br />

Ns<strong>in</strong>g : VTs<strong>in</strong>g;<br />

Nplur : VTplur;<br />

}<br />

B.2 German<br />

(written by Daniel Müller and Lars Konieczny 2004)<br />

S : NPnom VP "." |<br />

{numNoun, NPnom Nnom, VP V} | #num normal(N-V)<br />

{numNoun, NPnomx Nnom, V} ; #num topikalisiert<br />

NPnomx : DETnom Nnom RCx |<br />

{numDET, DETnom, Nnom}| #numerusabgleich DET-N<br />

{numREL, Nnom, RCx RCpure RCsub RELnom} | #num N-RelPron nom<br />

{numREL, Nnom, RCx RCpure RCobj RELakk} | #num N-RelPron askk<br />

{numNoun, Nnom, RCx RCpure RCsub V} ; #num N-V_e<strong>in</strong>g. SRC<br />

NPnom : DETnom Nnom RC |<br />

{numDET, DETnom, Nnom}| #numerusabgleich DET-N<br />

{numREL, Nnom, RC RCpure RCsub RELnom} | #num N-RelPron nom<br />

XX


B.2 German<br />

{numREL, Nnom, RC RCpure RCobj RELakk} | #num N-RelPron askk<br />

{numNoun, Nnom, RC RCpure RCsub V} ; #num N-V_e<strong>in</strong>g. SRC<br />

NPakkx : DETakk Nakk RCx |<br />

{numDET, DETakk, Nakk} |<br />

{numREL, Nakk, RCx RCpure RCsub RELnom} |<br />

{numREL, Nakk, RCx RCpure RCobj RELakk} |<br />

{numNoun, Nakk, RCx RCpure RCsub V} ;<br />

NPakk : DETakk Nakk RC |<br />

{numDET, DETakk, Nakk} |<br />

{numREL, Nakk, RC RCpure RCsub RELnom} |<br />

{numREL, Nakk, RC RCpure RCobj RELakk} |<br />

{numNoun, Nakk, RC RCpure RCsub V} ;<br />

VP : V NPakkx;<br />

RCx : "," RCpure | "" (0.9) ;<br />

RC : "," RCpure ","| "" (0.9) ;<br />

RCpure : RCsub | RCobj ;<br />

RCsub : RELnom NPakk V ;<br />

RCobj : RELakk NPnom V |<br />

{numNoun, NPnom Nnom, V} ;<br />

Nnom : Nnom_pl | Nnom_s<strong>in</strong>g ;<br />

Nakk : Nakk_pl | Nakk_s<strong>in</strong>g ;<br />

V : V_pl | V_s<strong>in</strong>g;<br />

DETnom : DETnom_pl | DETnom_s<strong>in</strong>g (0.7) ;<br />

DETakk : DETakk_pl | DETakk_s<strong>in</strong>g (0.7) ;<br />

RELnom : RELnom_pl | RELnom_s<strong>in</strong>g;<br />

RELakk : RELakk_pl | RELakk_s<strong>in</strong>g;<br />

Nnom_pl : Jungen | Polizisten | Passanten | Menschen;<br />

Nakk_pl : Jungen | Polizisten | Passanten | Menschen;<br />

Nnom_s<strong>in</strong>g : Junge | Polizist | Passant | Mensch;<br />

Nakk_s<strong>in</strong>g : Jungen | Polizisten | Passanten | Menschen;<br />

XXI


V_pl : beschimpfen | treffen | rufen | verspotten;<br />

V_s<strong>in</strong>g : beschimpft | trifft | ruft | verspottet;<br />

DETnom_pl : die;<br />

DETnom_s<strong>in</strong>g : der;<br />

DETakk_pl : die;<br />

DETakk_s<strong>in</strong>g : den;<br />

RELnom_pl : die;<br />

RELnom_s<strong>in</strong>g :der;<br />

RELakk_pl : die;<br />

RELakk_s<strong>in</strong>g : den;<br />

numNoun {<br />

Nnom_s<strong>in</strong>g : V_s<strong>in</strong>g;<br />

Nnom_pl : V_pl;<br />

Nakk_s<strong>in</strong>g : V_s<strong>in</strong>g;<br />

Nakk_pl : V_pl;<br />

}<br />

numDET {<br />

DETnom_pl : Nnom_pl;<br />

DETakk_pl : Nakk_pl;<br />

DETnom_s<strong>in</strong>g : Nnom_s<strong>in</strong>g;<br />

DETakk_s<strong>in</strong>g : Nakk_s<strong>in</strong>g;<br />

}<br />

numREL {<br />

Nnom_pl : RELnom_pl | RELakk_pl;<br />

Nakk_pl : RELnom_pl | RELakk_pl;<br />

Nnom_s<strong>in</strong>g : RELnom_s<strong>in</strong>g | RELakk_s<strong>in</strong>g;<br />

Nakk_s<strong>in</strong>g : RELnom_s<strong>in</strong>g | RELakk_s<strong>in</strong>g;<br />

}<br />

B.3 Mandar<strong>in</strong><br />

S : NP VP END ;<br />

VP : Vt NP | Vi ;<br />

NP : N | Rel N (0.5) ;<br />

XXII<br />

Appendix B Grammars


B.3 Mandar<strong>in</strong><br />

Rel : SRC (0.85) | ORC ;<br />

SRC : VP GEN ;<br />

ORC : NP Vt GEN ;<br />

N : Ns<strong>in</strong>g | Nplur ;<br />

### LEXICON ###<br />

Ns<strong>in</strong>g : lushi | guanyuan | fayanren |<br />

y<strong>in</strong>hangjia | faguan ;<br />

Nplur : lushimen | guanyuanmen | fayanren |<br />

y<strong>in</strong>hangjiamen | faguanmen ;<br />

Vi : sahuang | youyu | dadianhua | lijie ;<br />

Vt : biaoyang | gongji | lijie ;<br />

GEN : de ;<br />

END : "." ;<br />

biaoyang praise<br />

dadianhua phone<br />

de gen<br />

faguan judge<br />

faguanmen judges<br />

fayanren reporter/reporters<br />

gongji attack<br />

guanyuan senator<br />

guanyuanmen senators<br />

lijie understand<br />

lushi lawyer<br />

lushimen lawyers<br />

sahuang lie<br />

y<strong>in</strong>hangjia banker<br />

y<strong>in</strong>hangjiamen bankers<br />

youyu hesitate<br />

Table B.1: Mandar<strong>in</strong> lexicon<br />

XXIII


Erklärung der Urheberschaft<br />

Ich erkläre hiermit an Eides statt, dass ich die vorliegende Arbeit ohne Hilfe Dritter<br />

und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus<br />

fremden Quellen direkt oder <strong>in</strong>direkt übernommenen Gedanken s<strong>in</strong>d als solche kenntlich<br />

gemacht. Die Arbeit wurde bisher <strong>in</strong> gleicher oder ähnlicher Form <strong>in</strong> ke<strong>in</strong>er anderen<br />

Prüfungsbehörde vorgelegt und auch noch nicht veröffentlicht.<br />

Ort, Datum Unterschrift<br />

XXIV

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!