Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Connectionist</strong> <strong>Model<strong>in</strong>g</strong> <strong>of</strong> <strong>Experience</strong>-<strong>based</strong><br />
<strong>Effects</strong> <strong>in</strong> <strong>Sentence</strong> Comprehension<br />
Magisterarbeit im Fach Computerl<strong>in</strong>guistik<br />
am Institut für L<strong>in</strong>guistik der Universität Potsdam<br />
vorgelegt von<br />
Felix Engelmann<br />
Matrikelnummer: 716604<br />
Potsdam, Februar 2009<br />
1. Gutachter: Pr<strong>of</strong>. Dr. Shravan Vasishth<br />
2. Gutachter: Dr. He<strong>in</strong>er Drenhaus
Zusammenfassung<br />
Die vorliegende Arbeit widmet sich der Rolle <strong>in</strong>dividueller sprachlicher Erfahrung im<br />
Rahmen computationaler Modelle <strong>in</strong> der Psychol<strong>in</strong>guistik. Neuere Forschung deckt<br />
zunehmend sprachgebundene und sprechergebundene Verarbeitungsunterschiede auf,<br />
welche e<strong>in</strong>e Herausforderung für sprachunabhängige Universalmodelle darstellen. Davon<br />
ausgehend, dass <strong>in</strong>dividuelle und sprachspezifische Fähigkeiten sich aus der sprachlichen<br />
Umgebung des Sprechers herleiten, versuchen erfahrungsbasierte Theorien die<br />
hochkomplexen Zusammenhänge zwischen korpusbasierten Regularitäten und Sprachfähigkeit<br />
zu erfassen. Während explizit konzeptionierte symbolbasierte Modelle nur stark<br />
vere<strong>in</strong>fachende Darstellungen liefern können, ermöglicht die Anwendung konnektionistischer<br />
Verfahren lernfähige Modelle, mit denen sich funktionale Beziehungen zwischen<br />
Erfahrung und sprachlichen Fähigkeiten herstellen lassen.<br />
In der vorliegenden Arbeit wird die Aussagekraft solcher konnektionistischer Modelle<br />
im Vergleich zu traditionellen Ansätzen anhand von zwei Beispielphänomenen untersucht.<br />
Zum e<strong>in</strong>en wird die neuere Literatur zu Subjekt- und Objektrelativsätzen im<br />
ch<strong>in</strong>esischen Mandar<strong>in</strong> erörtert. Diesbezügliche Lesestudien zeigen zwar sehr gemischte<br />
Resultate, legen aber die Annahme nahe, dass Objektrelativsätze im Mandar<strong>in</strong> e<strong>in</strong>facher<br />
verarbeitet werden als Subjektrelativsätze, was e<strong>in</strong>en Gegensatz zur ansonsten<br />
sprachübergreifenden Subjektpräferenz darstellt.<br />
Als weiterer Untersuchungsgegenstand wird die Grammatikalitäts-Illusion <strong>in</strong> ungrammatischen<br />
Zentrale<strong>in</strong>bettungen herangezogen. Während im Englischen das Fehlen e<strong>in</strong>es<br />
e<strong>in</strong>gebetteten Verbes höhere Akzeptanz erzeugt als die grammatikalisch korrekte Version,<br />
entsteht diese Illusion bei Deutschen Lesern nicht.<br />
Aufbauend auf e<strong>in</strong>em konnektionistischen Modell von MacDonald und Christiansen<br />
(2002), welches konsistente Vorhersagen für <strong>in</strong>dividuelle Unterschiede <strong>in</strong> der Verarbeitung<br />
englischer Relativsätze ermöglicht, werden neue Simulationen durchgeführt um<br />
beide Phänomene mit Modell-Vorhersagen zu vergleichen. Die Simulationsergebnisse<br />
sagen zum e<strong>in</strong>en e<strong>in</strong>e Objektpräferenz für Mandar<strong>in</strong> und zum anderen das Fehlen der<br />
Grammatikalitäts-Illusion im Deutschen vorher. Die empirische Konsistenz der Ergebnisse<br />
ist jedoch nur von oberflächlicher Natur und hält e<strong>in</strong>er genaueren Analyse nicht<br />
stand.<br />
II
Acknowledgments<br />
I am grateful to Bei Wang, who was so k<strong>in</strong>d to teach me on Ch<strong>in</strong>ese relative clause<br />
grammar for this work. For revision and helpful comments I owe special thanks to Pavel<br />
Logaçev and Titus von der Malsburg. And most <strong>of</strong> all, I want to thank my supervisor<br />
Shravan Vasishth for all his support and patience.<br />
III
Contents<br />
List <strong>of</strong> Figures VI<br />
List <strong>of</strong> Tables VII<br />
1 Prelim<strong>in</strong>aries 1<br />
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />
1.2 Relative Clauses and Complexity . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.3 Psychol<strong>in</strong>guistic Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.3.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.3.2 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />
1.3.3 Canonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
1.3.4 <strong>Experience</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />
2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g 18<br />
2.1 The Subject/Object Difference . . . . . . . . . . . . . . . . . . . . . . . . 18<br />
2.2 Ch<strong>in</strong>ese Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />
2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically . . . . . . . . . 25<br />
2.3.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.3.2 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2.3.3 Canonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />
2.3.4 <strong>Experience</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
2.3.5 Other Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />
2.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . . . . . . . 32<br />
2.5 Forgett<strong>in</strong>g <strong>Effects</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />
2.5.1 The Grammaticality Illusion . . . . . . . . . . . . . . . . . . . . . 38<br />
2.5.2 Expla<strong>in</strong><strong>in</strong>g the Forgett<strong>in</strong>g Effect . . . . . . . . . . . . . . . . . . . 44<br />
3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension 47<br />
3.1 Structure and Learn<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />
3.2 Recursion and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />
3.3.1 MacDonald and Christiansen (2002) . . . . . . . . . . . . . . . . . 51<br />
3.3.2 Critique and Relation to other Approaches . . . . . . . . . . . . . 54<br />
3.3.3 What is learned? . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />
IV
Contents<br />
3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
4 Two SRN Prediction Studies 62<br />
4.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
4.1.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
4.1.2 Grammar and Corpora . . . . . . . . . . . . . . . . . . . . . . . . 63<br />
4.1.3 Tra<strong>in</strong><strong>in</strong>g and Test<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />
4.2 Replication <strong>of</strong> Previous Simulations . . . . . . . . . . . . . . . . . . . . . 65<br />
4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />
4.3.1 Simulation 1: Regularity . . . . . . . . . . . . . . . . . . . . . . . 66<br />
4.3.2 Simulation 2: Frequency . . . . . . . . . . . . . . . . . . . . . . . 70<br />
4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />
4.4 Forgett<strong>in</strong>g <strong>Effects</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />
4.4.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />
4.4.2 Simulation 3: English . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />
4.4.3 Simulation 4: German . . . . . . . . . . . . . . . . . . . . . . . . 77<br />
4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />
Bibliography VIII<br />
A Statistics XVII<br />
B Grammars XIX<br />
B.1 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIX<br />
B.2 German . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XX<br />
B.3 Mandar<strong>in</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXII<br />
V
List <strong>of</strong> Figures<br />
1.1 DLT memory cost for English RC. . . . . . . . . . . . . . . . . . . . . . . 8<br />
2.1 English RC read<strong>in</strong>g times (K<strong>in</strong>g and Just, 1991) . . . . . . . . . . . . . . 19<br />
2.2 DLT memory cost for English RCs . . . . . . . . . . . . . . . . . . . . . 25<br />
2.3 DLT memory cost for Mandar<strong>in</strong> RCs . . . . . . . . . . . . . . . . . . . . 26<br />
2.4 CC-READER simulation on English RCs (Just and Carpenter, 1992) . . 28<br />
2.5 DLT memory cost for the three VPs <strong>in</strong> a doubly embedded ORC. . . . . 41<br />
2.6 Ungrammaticality <strong>in</strong> English (Vasishth et al., 2008) . . . . . . . . . . . . 43<br />
2.7 Ungrammaticality <strong>in</strong> German (Vasishth et al., 2008) . . . . . . . . . . . . 44<br />
3.1 Simple recurrent network (Elman, 1990) . . . . . . . . . . . . . . . . . . 48<br />
3.2 Ungrammaticality simulation (Christiansen and Chater, 1999) . . . . . . 51<br />
3.3 F requency × Regularity simulation (MacDonald and Christiansen, 2002) 53<br />
3.4 Wells et al. (2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />
3.5 Wells et al. (2009) compared MacDonald and Christiansen (2002) . . . . 59<br />
4.1 Replication <strong>of</strong> MacDonald and Christiansen (2002) . . . . . . . . . . . . 66<br />
4.2 Replication <strong>of</strong> Konieczny and Ruh (2003) . . . . . . . . . . . . . . . . . . 67<br />
4.3 Output node activations on the relativizer <strong>in</strong> Mandar<strong>in</strong> . . . . . . . . . . 68<br />
4.4 Simulation 1: Mandar<strong>in</strong> ORC regularity. . . . . . . . . . . . . . . . . . . 69<br />
4.5 Simulation 2: Mandar<strong>in</strong> SRC Frequency . . . . . . . . . . . . . . . . . . 71<br />
4.6 Simulation 3a: Forgett<strong>in</strong>g effect <strong>in</strong> English without commas . . . . . . . 75<br />
4.7 Simulation 3b: Forgett<strong>in</strong>g effect <strong>in</strong> English with commas . . . . . . . . . 77<br />
4.8 Simulation 4a: Forgett<strong>in</strong>g effect <strong>in</strong> German . . . . . . . . . . . . . . . . . 78<br />
4.9 Simulation 4b: Forgett<strong>in</strong>g effect <strong>in</strong> German without commas . . . . . . . 79<br />
VI
List <strong>of</strong> Tables<br />
2.1 Languages with subject preference (L<strong>in</strong> and Bever, 2006b) . . . . . . . . 20<br />
2.2 Mandar<strong>in</strong> corpus study (Kuo and Vasishth, 2007) . . . . . . . . . . . . . 24<br />
2.3 RC Extraction preference predictions . . . . . . . . . . . . . . . . . . . . 33<br />
2.4 Studies <strong>of</strong> Mandar<strong>in</strong> RC extraction . . . . . . . . . . . . . . . . . . . . . 38<br />
A.1 Statistics for simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />
A.2 Statistics for simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />
A.3 Statistics for simulation 3a . . . . . . . . . . . . . . . . . . . . . . . . . . XVII<br />
A.4 Statistics for simulation 3b . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />
A.5 Statistics for simulation 4a . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />
A.6 Statistics for simulation 4b . . . . . . . . . . . . . . . . . . . . . . . . . . XVIII<br />
B.1 Mandar<strong>in</strong> lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXIII<br />
VII
Chapter 1<br />
Prelim<strong>in</strong>aries<br />
1.1 Introduction<br />
Psychol<strong>in</strong>guistic models <strong>of</strong> human language process<strong>in</strong>g <strong>in</strong> the tradition <strong>of</strong> competence<br />
theory (Chomsky, 1965) are anchored <strong>in</strong> generative grammar theories. So-called strong<br />
type-transparency approaches (Berwick and We<strong>in</strong>berg, 1984) assume that pars<strong>in</strong>g processes<br />
are directly driven by the underly<strong>in</strong>g grammatical structure. There are currently<br />
a number <strong>of</strong> compet<strong>in</strong>g grammar-<strong>based</strong> approaches that base on different grammatical<br />
pr<strong>in</strong>ciples. Examples are categorial grammars, head-driven phrase structure grammar,<br />
m<strong>in</strong>imalism, and optimality theory (for an overview <strong>of</strong> current symbolic process<strong>in</strong>g<br />
models see Vasishth and Lewis, 2006a). However, empirical data from self-paced read<strong>in</strong>g,<br />
eyetrack<strong>in</strong>g, or bra<strong>in</strong> imag<strong>in</strong>g studies, exhibit difficulty patterns that cannot be<br />
expla<strong>in</strong>ed by a strong l<strong>in</strong>k<strong>in</strong>g to competence. A precise model<strong>in</strong>g <strong>of</strong> human performance<br />
requires three aspects <strong>of</strong> cognition to be accounted for: a) biological constra<strong>in</strong>ts, b)<br />
gradedness, and c) experience.<br />
a) In contrast to the abstract logical nature <strong>of</strong> competence theories, human process<strong>in</strong>g<br />
performance results from an <strong>in</strong>teraction <strong>of</strong> l<strong>in</strong>guistic and biological factors. Cognitive<br />
psychology is centered around the resource-bounded nature <strong>of</strong> human cognition. Specifically<br />
important for real-time cognitive tasks is the limitedness <strong>of</strong> short-term memory.<br />
Common properties <strong>of</strong> short-term or work<strong>in</strong>g memory are a limited capacity, decay over<br />
time, and memory <strong>in</strong>terference. These <strong>in</strong>sights from general cognitive psychology suggest<br />
that language process<strong>in</strong>g performance is not only constra<strong>in</strong>ed by the pr<strong>in</strong>ciples <strong>of</strong><br />
work<strong>in</strong>g memory, but also relies on process<strong>in</strong>g strategies adapted to these constra<strong>in</strong>ts.<br />
The latter conclusion is addressed <strong>in</strong> a symbolically abstracted fashion by ambiguity<br />
resolution pr<strong>in</strong>ciples like m<strong>in</strong>imal attachment and late closure (Frazier, 1979) or<br />
special low versus high preferences <strong>in</strong> NP attachment ambiguities (Frazier and Clifton,<br />
1996). A theory address<strong>in</strong>g process<strong>in</strong>g difficulties caused by capacity and decay is the<br />
dependency locality theory (DLT, Gibson, 1998; 2000).<br />
b) Especially important for psychol<strong>in</strong>guistic models <strong>of</strong> work<strong>in</strong>g memory processes is to<br />
account for the “cont<strong>in</strong>uous, graded nature <strong>of</strong> human performance” (Vasishth and Lewis,<br />
2006a). Current comprehension models account<strong>in</strong>g for that aspect are implemented <strong>in</strong><br />
activation-<strong>based</strong> architectures, for example CC-READER (Just and Carpenter, 1992)<br />
1
Chapter 1 Prelim<strong>in</strong>aries<br />
and the ACT-R-<strong>based</strong> sentence process<strong>in</strong>g model (Lewis and Vasishth, 2005).<br />
c) The third aspect characteriz<strong>in</strong>g human cognition, which is more and more acknowledged<br />
<strong>in</strong> psychol<strong>in</strong>guistics, is the <strong>in</strong>fluence <strong>of</strong> experience. Probabilistic models like<br />
Jurafsky (1996) use corpus-extracted likelihoods to construct probabilistic grammars,<br />
therewith captur<strong>in</strong>g aspects <strong>of</strong> frequency and plausibility. The scope <strong>of</strong> these models,<br />
however, is mostly conf<strong>in</strong>ed to the prediction <strong>of</strong> ambiguity resolution and acceptability<br />
rat<strong>in</strong>gs. A related theory predict<strong>in</strong>g word-by-word difficulties is the expectation-<strong>based</strong><br />
approach by Levy (2008). The tun<strong>in</strong>g hypothesis by Mitchell et al. (1995) rather<br />
considers higher-level structural regularities as the ma<strong>in</strong> aspect <strong>of</strong> experience. <strong>Experience</strong>,<br />
however, is a vague term that relates to a complex <strong>in</strong>teraction <strong>of</strong> frequencies <strong>of</strong><br />
words and structures, plausibility, semantic context and structural regularities. Moreover,<br />
the connection between exposure and process<strong>in</strong>g skill is mediated by a learn<strong>in</strong>g<br />
process, which is aga<strong>in</strong> constra<strong>in</strong>ed by biological factors. Thus the l<strong>in</strong>k<strong>in</strong>g from corpus<br />
regularities and the like to observable effects is not trivial.<br />
A type <strong>of</strong> models that provide a promis<strong>in</strong>g approach to experience are connectionist<br />
network models (also called models <strong>of</strong> parallel distributed process<strong>in</strong>g, PDP). In the<br />
past connectionist models have predom<strong>in</strong>antly been used for word-level tasks like spoken<br />
and visual word recognition (McClelland and Elman, 1984; Seidenberg and McClelland,<br />
1989), lexical ambiguity resolution (Kawamoto, 1993), phoneme production (Dell et al.,<br />
2002), and past tense acquisition (Rumelhart and McClelland, 1985). However, more and<br />
more connectionist approaches start to tackle the doma<strong>in</strong> <strong>of</strong> comprehension (Christiansen<br />
and Chater, 1999; Rohde, 2002; Tabor et al., 1997). MacDonald and Christiansen (2002)<br />
(henceforth MC02) successfully used a simple recurrent network (SRN, Elman,<br />
1990) to implement their account <strong>of</strong> skill-through-experience, propos<strong>in</strong>g an <strong>in</strong>teraction<br />
<strong>of</strong> structural regularities with biological factors. Us<strong>in</strong>g an SRN they took advantage <strong>of</strong><br />
special properties evolv<strong>in</strong>g from <strong>in</strong>teractive activation-<strong>based</strong> parallel distributed connectionism.<br />
The <strong>in</strong>herent architectural properties <strong>of</strong> such models cause emergent behavior<br />
that can be described <strong>in</strong> terms <strong>of</strong> limited memory capacity, decay, <strong>in</strong>terference, and cont<strong>in</strong>uously<br />
graded performance. In addition, their behavior and <strong>in</strong>ternal representations<br />
are entirely determ<strong>in</strong>ed by learn<strong>in</strong>g. That means that connectionist networks are basically<br />
pure grammar-<strong>in</strong>dependent performance models (Christiansen and Chater, 1999;<br />
p. 3), that <strong>in</strong>corporate all three aspects <strong>of</strong> human language process<strong>in</strong>g mentioned above.<br />
A special advantage <strong>of</strong> experience-<strong>based</strong> models is that they can account for <strong>in</strong>dividual<br />
performance differences and language-specific effects <strong>in</strong> a similar way. Individual differences<br />
are addressed <strong>in</strong> symbolic models like Just and Carpenter (1992) by differences <strong>in</strong><br />
the capacity limit, but they do not provide a comprehensive explanation for the orig<strong>in</strong><br />
<strong>of</strong> these limits. Language-specific effects means different performance patterns on comparable<br />
structures <strong>in</strong> different languages. Evidence for that phenomenon are effects <strong>of</strong><br />
antilocality <strong>in</strong> head-f<strong>in</strong>al languages (Konieczny, 2000) and the forgett<strong>in</strong>g effect <strong>in</strong><br />
complex center-embedd<strong>in</strong>g <strong>in</strong> English versus German (Vasishth et al., 2008). Languagespecific<br />
effects usually exceed the scope <strong>of</strong> models that do not <strong>in</strong>corporate experience<br />
factors. MC02’s SRN model has proven to make consistent predictions concern<strong>in</strong>g global<br />
2
1.2 Relative Clauses and Complexity<br />
and <strong>in</strong>dividual differences <strong>in</strong> the comprehension <strong>of</strong> subject and object relative clauses.<br />
The aim <strong>of</strong> the work at hand is to assess this model’s predictions on two related phenomena<br />
that are most probably the result <strong>of</strong> language-specific experience: the subject/object<br />
difference <strong>in</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong> multiply embedded object<br />
relative clauses <strong>in</strong> English and German.<br />
S/O Difference <strong>in</strong> Mandar<strong>in</strong> Studies on Mandar<strong>in</strong> relative clauses are <strong>in</strong>conclusive<br />
regard<strong>in</strong>g the preferred extraction. While all other languages <strong>in</strong>vestigated <strong>in</strong> that matter<br />
show a subject preference, Mandar<strong>in</strong> is a potential candidate for an exception from that<br />
global consistency. This is also what is claimed to be predicted by MC02’s account for<br />
structural regularity (Hsiao and Gibson, 2003; Kuo and Vasishth, 2007).<br />
Forgett<strong>in</strong>g Effect The forgett<strong>in</strong>g effect refers to a grammaticality illusion <strong>of</strong> ungrammatical<br />
center-embedd<strong>in</strong>g that is present <strong>in</strong> English but not <strong>in</strong> German. Grammatical<br />
differences between German and English here also suggest an explanation <strong>based</strong> on experience<br />
with structural regularities.<br />
In the rema<strong>in</strong><strong>in</strong>g <strong>of</strong> this chapter the syntactic properties <strong>of</strong> relative clauses will briefly<br />
be <strong>in</strong>troduced. Then four relevant explanatory aspects <strong>of</strong> psychol<strong>in</strong>guistic models, will<br />
be discussed: memory, expectation, canonicity, and experience. Chapter 2 will lay out<br />
the two issues <strong>of</strong> the subject/object difference <strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong><br />
English and German, and discuss potential explanations. Chapter 3 will then expla<strong>in</strong> the<br />
properties <strong>of</strong> simple recurrent networks and discuss MC02’s account <strong>in</strong> detail. F<strong>in</strong>ally, <strong>in</strong><br />
chapter 4 SRNs will be used to simulate the two issues addressed here and the result<strong>in</strong>g<br />
predictions will be discussed.<br />
1.2 Relative Clauses and Complexity<br />
The focus <strong>of</strong> the work at hand are NP-modify<strong>in</strong>g restrictive relative clauses (RCs) like<br />
those shown <strong>in</strong> example (1). The embedded RC misses an NP, here represented by<br />
ei, which transformational syntax theories <strong>in</strong>terpret as an unpronounced trace <strong>of</strong> an<br />
extraction movement (e.g. Chomsky, 1981). The trace appears either <strong>in</strong> subject or<br />
object position <strong>in</strong> the embedded clause and is co-<strong>in</strong>dexed with the relative pronoun that,<br />
which b<strong>in</strong>ds it to the preced<strong>in</strong>g head noun. The position <strong>of</strong> the trace depends on the<br />
extraction type <strong>of</strong> the RC. In a subject-extracted relative clause (subject relative<br />
clause, subject relative, or SRC) like <strong>in</strong> (1a) the embedded subject reporter is extracted<br />
as the subject <strong>of</strong> the ma<strong>in</strong> clause. In an object-extracted relative clause (object<br />
relative clause, object relative, or ORC) like (1b) the extracted element served as object<br />
<strong>of</strong> the embedded clause and subject <strong>of</strong> the matrix clause. Hence, <strong>in</strong> the ORC the noun<br />
reporter fulfills two roles.<br />
3
Chapter 1 Prelim<strong>in</strong>aries<br />
(1) a. The reporter thati ei attacked the senator admitted the error. (SRC)<br />
b. The reporter thati the senator attacked ei admitted the error. (ORC)<br />
The example shows subject-modify<strong>in</strong>g relative clauses, i.e., they are attached to<br />
the subject-noun phrase <strong>of</strong> the matrix clause. RCs can just as well attach to the object.<br />
In the object-modify<strong>in</strong>g case the modified noun fulfills tow roles <strong>in</strong> the SRC<br />
(object <strong>in</strong> the ma<strong>in</strong> clause and subject <strong>in</strong> the RC) and not <strong>in</strong> the ORC. In language<br />
comprehension theories the embedded extraction traces are called gaps that have to<br />
be filled to reconstruct the underly<strong>in</strong>g argument structure. That <strong>in</strong>volves identify<strong>in</strong>g<br />
a filler (the head noun) and f<strong>in</strong>d<strong>in</strong>g the appropriate gap. Theories like the Active<br />
Filler Strategy (Frazier and Clifton, 1989; Frazier and Flores d’Arcais, 1989) or<br />
the top-down gap-search<strong>in</strong>g mechanism by L<strong>in</strong>, Fong, and Bever (2005) deal with this<br />
problem <strong>in</strong> different ways. Theories <strong>based</strong> on memory assume that unbounded dependencies<br />
like non-<strong>in</strong>tegrated arguments or fillers have to be stored <strong>in</strong> l<strong>in</strong>guistic work<strong>in</strong>g<br />
memory (WM) until the element is reached that is necessary to <strong>in</strong>tegrate the dependent<br />
<strong>in</strong>to the sentential structure (K<strong>in</strong>g and Just, 1991; Just and Carpenter, 1992; 2002; Gibson,<br />
1998; Lewis et al., 2006; Lewis and Vasishth, 2005; Vasishth and Lewis, 2006b). In<br />
example (1) the filler noun reporter must be held <strong>in</strong> memory until the verb attacked is<br />
reached, which signals the gap. In (1b) the distance between the filler reporter and its<br />
gap is relatively big compared to (1a). The distances between dependent elements can<br />
be further <strong>in</strong>creased by multiple RC embedd<strong>in</strong>g, which leads us to the third po<strong>in</strong>t <strong>of</strong><br />
<strong>in</strong>terest.<br />
The abovementioned properties make relative clauses especially <strong>in</strong>terest<strong>in</strong>g for psychol<strong>in</strong>guistic<br />
studies <strong>of</strong> memory processes and gap fill<strong>in</strong>g strategies. An additional po<strong>in</strong>t<br />
<strong>of</strong> <strong>in</strong>terest is recursion, which will be discussed now. There are crucial differences <strong>in</strong><br />
the complexity <strong>of</strong> embedded SRCs and ORCs. While object relatives <strong>in</strong> English can<br />
be recursively center-embedded <strong>in</strong> the ma<strong>in</strong> clause multiple embedd<strong>in</strong>g <strong>of</strong> subject relatives<br />
results <strong>in</strong> an iterative right-branch<strong>in</strong>g structure. In consequence ORC embedd<strong>in</strong>g<br />
causes longer distances between the dependents, illustrated <strong>in</strong> example (2a) and (2b) by<br />
co-<strong>in</strong>dexation.<br />
(2) Doubly embedded RCs:<br />
a. The reporter that1 e1 attacked the senator that2 e2 recognized the <strong>of</strong>ficer<br />
admitted the error. (SRC)<br />
b. The reporter that1 the senator that2 the <strong>of</strong>ficer recognized e2 attacked e1<br />
admitted the error. (ORC)<br />
Center-embedded dependencies as <strong>in</strong> example (2) result from the so-called mirror<br />
recursion, which can be characterized by the follow<strong>in</strong>g phrase structure rules:<br />
X → aXa; X → bXb; X → {} (1.1)<br />
The grammar <strong>in</strong> 1.1 generates <strong>in</strong>f<strong>in</strong>ite embedd<strong>in</strong>g <strong>in</strong> the form <strong>of</strong> abba. Allow<strong>in</strong>g for<br />
<strong>in</strong>f<strong>in</strong>ite recursion <strong>in</strong>troduces substantial formalistic complexities. Unlike right-branch<strong>in</strong>g<br />
4
1.2 Relative Clauses and Complexity<br />
as found <strong>in</strong> English SRCs center-embedd<strong>in</strong>g recursion exceeds f<strong>in</strong>ite-state expressibility.<br />
Mirror recursion is one <strong>of</strong> three basic recursion types def<strong>in</strong>ed by Chomsky (1957) as<br />
relevant for natural language. The second type is identity recursion, which produces<br />
cross-dependencies <strong>of</strong> the form abab as for example <strong>in</strong> Swiss-German or Dutch relatives<br />
clauses. Cross-dependencies are very rare but still problematic because they suggest that<br />
language is not context-free. A type even harder to f<strong>in</strong>d, if at all, is count<strong>in</strong>g recursion.<br />
It is characterized <strong>in</strong> terms <strong>of</strong> an artificial language like a n b n , where the occurrences<br />
<strong>of</strong> b depend on the number <strong>of</strong> occurrences <strong>of</strong> a. It is debatable whether this type <strong>of</strong><br />
recursion exists <strong>in</strong> natural language (cf. Christiansen and Chater, 1999) but this shall<br />
not be relevant <strong>in</strong> this thesis.<br />
Interest<strong>in</strong>gly human language performance seems to be unable to handle recursion beyond<br />
double-embedd<strong>in</strong>g. Double center-embedd<strong>in</strong>g already poses severe comprehension<br />
difficulties and is <strong>of</strong>ten rated as ungrammatical (Blumenthal, 1966). In addition some<br />
studies <strong>of</strong> center-embedd<strong>in</strong>g show a grammaticality illusion <strong>of</strong> ungrammatical structures<br />
(Frazier, 1985; Gibson and Thomas, 1999; Vasishth et al., 2008; Christiansen and Chater,<br />
1999). In spite <strong>of</strong> the fact that cross-dependency is formally most complex it is centerembedd<strong>in</strong>g<br />
which is the hardest recursion type for the human comprehender (Bach et al.,<br />
1986). The highly demand<strong>in</strong>g dependencies <strong>in</strong>volved <strong>in</strong> center-embedd<strong>in</strong>g and the potential<br />
ambiguities produced by multiple gap positions yield process<strong>in</strong>g difficulties that<br />
have been extensively <strong>in</strong>vestigated <strong>in</strong> psychol<strong>in</strong>guistic studies. The human difficulties<br />
<strong>in</strong> process<strong>in</strong>g recursive structures are also evident language-<strong>in</strong>dependently. For example<br />
center-embedd<strong>in</strong>g dependencies between digits and letters produce similar difficulties as<br />
<strong>in</strong> language process<strong>in</strong>g (Lark<strong>in</strong> and Burns, 1977). Current symbolic psychol<strong>in</strong>guistic theories<br />
close the obvious gap between grammatical competence and empirical performance<br />
by the <strong>in</strong>volvement <strong>of</strong> memory limitations, decay, and attention span or by explicitly<br />
def<strong>in</strong><strong>in</strong>g a limit to the number <strong>of</strong> recursion levels. However, a question that arises is<br />
whether recursion should be assumed <strong>in</strong> the human processor at all. S<strong>in</strong>ce competence<br />
cannot be directly assessed, it can be empirically accessed only through the l<strong>in</strong>k to performance.<br />
But the l<strong>in</strong>k <strong>in</strong> turn is dependent on the underly<strong>in</strong>g competence theory. This<br />
results <strong>in</strong> a non-falsifiability <strong>of</strong> an <strong>in</strong>f<strong>in</strong>ite competence, which Christiansen (1992) calls<br />
the Chomskian paradox.<br />
“In particular, I suggest that recursion is a conceptual artifact <strong>of</strong> the competence/performance<br />
dist<strong>in</strong>ction [...], <strong>in</strong>stead <strong>of</strong> a necessary characteristic <strong>of</strong><br />
the underly<strong>in</strong>g computational mechanism.” (Christiansen, 1992; p. 1)<br />
As will be shown <strong>in</strong> chapter 3 connectionist models are performance models, that<br />
account for memory limitations, recursion limits, and the characteristics <strong>of</strong> different<br />
recursion types concern<strong>in</strong>g human comprehension performance. The follow<strong>in</strong>g section<br />
<strong>in</strong>troduces four aspects <strong>of</strong> psychol<strong>in</strong>guistics relevant for this thesis.<br />
5
1.3 Psychol<strong>in</strong>guistic Aspects<br />
Chapter 1 Prelim<strong>in</strong>aries<br />
There are four major aspects considered <strong>in</strong> sentence process<strong>in</strong>g theories which are <strong>of</strong><br />
relevance for the work at hand: memory, expectation, canonicity, and experience. I will<br />
briefly present these and major related theories here.<br />
1.3.1 Memory<br />
Locality and Antilocality<br />
Complex sentences can conta<strong>in</strong> <strong>in</strong>creased distances between dependent constituents,<br />
which result <strong>in</strong> process<strong>in</strong>g difficulties. For example Grodner and Gibson (2005) found<br />
<strong>in</strong> a self-paced read<strong>in</strong>g experiment that an <strong>in</strong>creased distance between a verb and its<br />
subject correlates with <strong>in</strong>creased read<strong>in</strong>g times at the ma<strong>in</strong> verb (example 3).<br />
(3) a. The nurse supervised the adm<strong>in</strong>istrator while . . .<br />
b. The nurse from the cl<strong>in</strong>ic supervised the adm<strong>in</strong>istrator while . . .<br />
c. The nurse who was from the cl<strong>in</strong>ic supervised the adm<strong>in</strong>istrator while . . .<br />
The dependency relations <strong>in</strong> example (3) are illustrated by bold type face. In (3a) the<br />
dependent subject (the noun nurse) can be <strong>in</strong>tegrated with its head (the verb supervised)<br />
s<strong>in</strong>ce there is no material <strong>in</strong>terven<strong>in</strong>g before the verb. In examples (3b) and (3c) the<br />
noun-verb distance is <strong>in</strong>creas<strong>in</strong>g because other dependents or heads <strong>in</strong>tervene. This leads<br />
to <strong>in</strong>creased process<strong>in</strong>g time on the <strong>in</strong>tegration site, called a locality effect.<br />
Locality effects are most commonly attributed to the limited nature <strong>of</strong> l<strong>in</strong>guistic work<strong>in</strong>g<br />
memory, constra<strong>in</strong><strong>in</strong>g the process <strong>of</strong> <strong>in</strong>tegration. For b<strong>in</strong>d<strong>in</strong>g two dependent elements<br />
<strong>in</strong> the sentence structure the dependent has to be held <strong>in</strong> memory until its head is<br />
reached. Independently motivated properties <strong>of</strong> work<strong>in</strong>g memory are capacity limits,<br />
decay, and <strong>in</strong>terference. Capacity poses an upper limit to memory capacity usage and<br />
hence a limit to the number <strong>of</strong> elements held <strong>in</strong> memory. Furthermore, the representation<br />
<strong>of</strong> an item decays either as a function <strong>of</strong> time (Lewis and Vasishth, 2005) or <strong>of</strong><br />
the complexity <strong>of</strong> <strong>in</strong>terven<strong>in</strong>g material (Gibson, 1998; Just and Carpenter, 1992), which<br />
makes it harder to retrieve properly from memory. F<strong>in</strong>ally, memorized elements that are<br />
similar <strong>in</strong> certa<strong>in</strong> features can possibly be confused. For example, two nouns that are<br />
similar <strong>in</strong> animacy cause similarity-<strong>based</strong> <strong>in</strong>terference. The most documented<br />
effect attributed to similarity is retrieval <strong>in</strong>terference (Gordon et al., 2001; 2002; 2004;<br />
2006; Lewis and Vasishth, 2005; Dyke and Lewis, 2003; Dyke and McElree, 2006). Under<br />
this <strong>in</strong>terpretation the process <strong>of</strong> retriev<strong>in</strong>g a dependent noun at the head region<br />
is impaired because access is mediated by so-called retrieval cues that correspond to<br />
features-value pairs. Now, when several nouns share similar features they cannot be<br />
dist<strong>in</strong>guished by the retrieval cues. Encod<strong>in</strong>g a noun that shares features with an already<br />
encoded noun can be subject to encod<strong>in</strong>g <strong>in</strong>terference (Gordon et al., 2004). And<br />
6
1.3 Psychol<strong>in</strong>guistic Aspects<br />
f<strong>in</strong>ally similarity-<strong>based</strong> <strong>in</strong>terference is sometimes also said to affect process<strong>in</strong>g between<br />
encod<strong>in</strong>g and <strong>in</strong>tegration; this is called storage <strong>in</strong>terference (Lewis et al., 2006).<br />
In contrast to locality effects <strong>in</strong> English opposite effects have been found <strong>in</strong> head-f<strong>in</strong>al<br />
languages. For example, Konieczny (2000), Vasishth and Lewis (2006b), and Gibson<br />
et al. (2005b) found so-called antilocality effects <strong>in</strong> German, H<strong>in</strong>di, and Japanese, respectively.<br />
For example, Konieczny (2000) used German stimuli (4) that were comparable<br />
to those used by Grodner and Gibson (2005) <strong>in</strong> their English study.<br />
(4) a. Er hat den Abgeordneten begleitet, und . . .<br />
He has the delegate escorted, and . . .<br />
“He escorted the delegate, and . . . ”<br />
b. Er hat den Abgeordneten ans Rednerpult begleitet, und . . .<br />
He has the delegate to_the lectern escorted, and . . .<br />
“He escorted the delegate to the lectern, and . . . ”<br />
c. Er hat den Abgeordneten an das große Rednerpult begleitet, und . . .<br />
He has the delegate to the big lectern escorted, and . . .<br />
“He escorted the delegate to the large lectern, and . . . ”<br />
Read<strong>in</strong>g times at the ma<strong>in</strong> verb begleitet showed fastest read<strong>in</strong>g <strong>in</strong> (4c) and slowest <strong>in</strong><br />
(4a). That is contradict<strong>in</strong>g locality predictions. S<strong>in</strong>ce antilocality effects were first only<br />
discovered <strong>in</strong> head-f<strong>in</strong>al languages, it is commonly assumed that the language-specific<br />
word order regularities cause the divergent effects. Whereas theories <strong>based</strong> on <strong>in</strong>tegration<br />
cost (e.g. DLT: Gibson, 1998) cannot account for antilocality, expectation-<strong>based</strong><br />
theories (Hale, 2001; Levy, 2008) predict antilocality effects to be caused by an <strong>in</strong>creas<strong>in</strong>g<br />
expectation <strong>of</strong> the verb to appear while more <strong>in</strong>terven<strong>in</strong>g material is read. Notably,<br />
expectation theories predict antilocality language-<strong>in</strong>dependently. Their predictions receive<br />
support by recent evidence (Jaeger, Fedorenko, H<strong>of</strong>meister, and Gibson, 2008) for<br />
the presence <strong>of</strong> antilocality even <strong>in</strong> non-head-f<strong>in</strong>al languages (for an overview <strong>of</strong> possible<br />
explanations concern<strong>in</strong>g locality and antilocality see Vasishth, 2008).<br />
The Dependency Locality Theory<br />
Gibson (1998; 2000) formulated a theory <strong>of</strong> capacity and decay <strong>in</strong> work<strong>in</strong>g memory <strong>in</strong><br />
a discrete symbolic fashion. The cost predictions were <strong>based</strong> on dependency-<strong>in</strong>duced<br />
predictions <strong>of</strong> syntactic nodes and their distances, which is why the orig<strong>in</strong>al theory was<br />
called the Syntactic Prediction Locality Theory (SPLT). The later revised version<br />
is called the Dependency Locality Theory, referred to as DLT. The DLT<br />
assigns a memory cost to each word <strong>in</strong> a sentence on the basis <strong>of</strong> two discrete functions:<br />
Integration Cost and Storage Cost. Integration Cost directly accounts for locality<br />
effects by a discrete distance measure. It predicts the amount <strong>of</strong> process<strong>in</strong>g difficulty<br />
at the <strong>in</strong>tegration site and is def<strong>in</strong>ed by the number <strong>of</strong> discourse referents <strong>in</strong>terven<strong>in</strong>g<br />
between the dependent and its head. Valid discourse referents <strong>in</strong> Gibson’s sense are<br />
7
Chapter 1 Prelim<strong>in</strong>aries<br />
referrential constituents like nouns and ma<strong>in</strong> verbs as they refer to objects and events,<br />
respectively. Pronouns, however, do not <strong>in</strong>duce memory cost because they are assumed<br />
to be immediately accessible. The assumption beh<strong>in</strong>d Integration Cost is that every<br />
stored item receives an activation which decays depend<strong>in</strong>g on the number <strong>of</strong> newly encoded<br />
discourse referents while it is ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> memory. Integrat<strong>in</strong>g an element, i.e.,<br />
relat<strong>in</strong>g it to its head, needs more process<strong>in</strong>g effort when the element has less activation.<br />
Thus the <strong>in</strong>tegration cost is a function, monotonously <strong>in</strong>creas<strong>in</strong>g with the number <strong>of</strong><br />
<strong>in</strong>terven<strong>in</strong>g discourse referents. The cost accounts only implicitly for decay over time<br />
s<strong>in</strong>ce time is only represented discretely by successive discourse referents. The unit <strong>of</strong><br />
Integration Cost is energy units (EUs).<br />
The memory capacity limit is accounted for by the second pr<strong>in</strong>ciple <strong>of</strong> DLT: Storage<br />
Cost. It rests on the assumption that the parser constantly predicts the most probable<br />
complete sentence structure given the previous material and keeps it <strong>in</strong> memory.<br />
Structural complexity is calculated by the number <strong>of</strong> syntactic heads conta<strong>in</strong>ed. The<br />
more complex the predicted structure, the more syntactic heads it conta<strong>in</strong>s. Every predicted<br />
head uses up memory resources, so-called memory units (MUs). Memory load<br />
also affects process<strong>in</strong>g, because storage and process<strong>in</strong>g use the same resources (Just and<br />
Carpenter, 1992). Consequently, the more heads are predicted the higher the process<strong>in</strong>g<br />
cost. The important difference between the two costs is the location <strong>of</strong> their effects.<br />
While Integration Cost accounts for process<strong>in</strong>g differences only at the <strong>in</strong>tegration site,<br />
Storage Cost for a predicted structure affects process<strong>in</strong>g <strong>of</strong> every follow<strong>in</strong>g part <strong>of</strong> the<br />
sentence. Figure 1.1 shows the Integration Cost C(I) and the Storage Cost C(S) at<br />
each po<strong>in</strong>t <strong>in</strong> an English object relative clause. See<strong>in</strong>g the sentence-<strong>in</strong>itial determ<strong>in</strong>er<br />
ORC The reporter whoi the senator attacked ei admitted the error<br />
C(I) 0 0 0 0 0 1+2 3 0 0+1<br />
C(S) 2 1 3 4 3 1 1 1 0<br />
Total 2 1 3 4 3 3 4 1 1<br />
Figure 1.1: DLT cost metrics for an English ORC accord<strong>in</strong>g to Gibson (1998).<br />
<strong>in</strong>duces the prediction <strong>of</strong> a ma<strong>in</strong> clause. Hence predictions for an NP and a ma<strong>in</strong> verb<br />
have to be stored. Note that DLT considers the prediction <strong>of</strong> the ma<strong>in</strong> verb as cost-free,<br />
but <strong>in</strong> literature, it is mostly assigned a cost. For simplicity, <strong>in</strong> this work Storage Cost<br />
will be consistently assumed for the ma<strong>in</strong> verb. Hav<strong>in</strong>g completed the NP only the verb<br />
is predicted. At the relative pronoun who a Storage Cost <strong>of</strong> 3 is assigned because an<br />
embedded SRC is predicted, conta<strong>in</strong><strong>in</strong>g two heads: the embedded verb and a subject<br />
gap. See<strong>in</strong>g another determ<strong>in</strong>er changes the prediction <strong>in</strong>to an ORC, which conta<strong>in</strong>s<br />
one more head, namely the embedded subject. On senator only the embedded verb,<br />
the object gap, and the ma<strong>in</strong> verb stay predicted. On the embedded verb attacked then<br />
two <strong>in</strong>tegrations take place. The subject <strong>in</strong>tegration <strong>of</strong> attacked costs 1 EU because the<br />
8
1.3 Psychol<strong>in</strong>guistic Aspects<br />
verb counts as a new discourse referent. Establish<strong>in</strong>g the relation between the relative<br />
pronoun who and the empty element consumes 2 EU because two discourse referents<br />
(senator and attacked) have been processed meanwhile. The biggest cost is assigned to<br />
the ma<strong>in</strong> verb (admitted). Here the subject reporter is <strong>in</strong>tegrated after process<strong>in</strong>g three<br />
new discourse referents (the embedded subject, the embedded verb, and the ma<strong>in</strong> verb).<br />
Additionally an NP head is predicted because the verb is transitive. The last <strong>in</strong>tegration<br />
takes place at the last word error. The <strong>in</strong>tegration <strong>of</strong> the determ<strong>in</strong>er produces no<br />
cost, whereas build<strong>in</strong>g the structural relation with its head admitted consumes 1 EU.<br />
Altogether the total cost predicts the highest difficulty at the ma<strong>in</strong> verb, result<strong>in</strong>g from<br />
the long distance from its dependent arguments. Us<strong>in</strong>g the total cost as a read<strong>in</strong>g time<br />
predictor the DLT perfectly predicts locality effects.<br />
CC-READER<br />
A computational model <strong>in</strong> its basic assumptions consistent with the DLT cost association<br />
is the Capacity Constra<strong>in</strong>ed READER or CC-READER (Just and Carpenter, 1992;<br />
2002), the successor <strong>of</strong> READER, a sentence read<strong>in</strong>g model implemented <strong>in</strong> a framework<br />
called CAPS (Collaborative Activation-<strong>based</strong> Production System). CC-READER<br />
is an activation-<strong>based</strong> simulation <strong>of</strong> l<strong>in</strong>guistic work<strong>in</strong>g memory processes with a limited<br />
capacity as explanatory factor for memory load effects and <strong>in</strong>dividual differences. The<br />
constitut<strong>in</strong>g mechanism <strong>of</strong> CAPS is activation propagation caused by symbolic production<br />
rules. Both productions and stored elements (words, structures, propositions, etc.)<br />
use the same resources. The availability <strong>of</strong> elements as well as the applicability <strong>of</strong> productions<br />
is dependent on their received activation exceed<strong>in</strong>g a certa<strong>in</strong> threshold. The<br />
condition for a production rule is met when the activation threshold <strong>of</strong> the respective<br />
source element is reached. An important architectural property <strong>of</strong> CC-READER is the<br />
usage <strong>of</strong> process<strong>in</strong>g cyles. In each process<strong>in</strong>g cycle all currently applicable production<br />
rules fire simultaneously, mean<strong>in</strong>g they propagate weighted activation from the source to<br />
a “target element”. In this sense capacity is def<strong>in</strong>ed by the maximally available amount<br />
<strong>of</strong> activation that can be used by all productions and stored elements per process<strong>in</strong>g<br />
cycle. In case <strong>of</strong> activation shortage thresholds can be reached by <strong>in</strong>cremental production<br />
fir<strong>in</strong>g, result<strong>in</strong>g <strong>in</strong> more process<strong>in</strong>g cycles. This is what happens when the total<br />
activation has to be “scaled back” because it exceeds the capacity limit. A back-scal<strong>in</strong>g<br />
affects the activation <strong>of</strong> both productions and stored items: “Any shortfall <strong>of</strong> activation<br />
is assessed aga<strong>in</strong>st both storage and process<strong>in</strong>g, <strong>in</strong> proportion to the amount <strong>of</strong><br />
activation they are currently consum<strong>in</strong>g” (Just and Carpenter, 1992; p. 135). Read<strong>in</strong>g<br />
word-by-word, the pars<strong>in</strong>g process also depends on lexical access and constructs a<br />
representation <strong>of</strong> the sentence that <strong>in</strong>cludes thematic role <strong>in</strong>formation. Slow-downs <strong>in</strong><br />
process<strong>in</strong>g are represented by an <strong>in</strong>creased number <strong>of</strong> process<strong>in</strong>g cycles. The two concepts<br />
<strong>of</strong> activation propagation and process<strong>in</strong>g cycles enable CC-READER to predict a)<br />
read<strong>in</strong>g slow-downs <strong>in</strong> demand<strong>in</strong>g sentence regions (due to storage <strong>of</strong> many elements or<br />
fir<strong>in</strong>g <strong>of</strong> many productions) and b) region-specific <strong>in</strong>dividual differences <strong>in</strong> read<strong>in</strong>g; both<br />
9
Chapter 1 Prelim<strong>in</strong>aries<br />
empirically consistent, as comparisons with studies like K<strong>in</strong>g and Just (1991) show. All<br />
predicted process<strong>in</strong>g difficulties are expla<strong>in</strong>ed by demand <strong>of</strong> activation. This also covers<br />
<strong>in</strong>dividual differences by ascrib<strong>in</strong>g them to different limits <strong>of</strong> the total activation amount.<br />
The effect <strong>of</strong> decay is only <strong>in</strong>directly accounted for, depend<strong>in</strong>g on the number <strong>of</strong> newly<br />
activated elements, which is very similar to the DLT Integration Cost. The capacity<br />
limit causes newly needed activation to be preferably drawn from older elements. This<br />
results <strong>in</strong> a cont<strong>in</strong>uously graded decay, which, however, is not temporally dependent but<br />
rather depends on storage/process<strong>in</strong>g demands just like <strong>in</strong> the DLT.<br />
ACT-R <strong>Sentence</strong> Process<strong>in</strong>g Model<br />
Another computationally implemented sentence process<strong>in</strong>g model (Lewis and Vasishth,<br />
2005; Lewis et al., 2006; Vasishth and Lewis, 2006b) is built on the cognitive architecture<br />
ACT-R (Anderson and Lebiere, 1998; Anderson et al., 2004). Similar to the CAPS<br />
model ACT-R works on a sub-symbolic activation propagation basis. Rule application,<br />
however, happens on a more symbolic-fashioned condition-action relation. Process<strong>in</strong>g<br />
difficulties are predom<strong>in</strong>antly retrieval-<strong>based</strong>. Elements (memory chunks) like lexical<br />
entries <strong>in</strong>volved <strong>in</strong> a production need to be retrieved from declarative memory. The<br />
success <strong>of</strong> the retrieval process depends on the chunk’s current activation level and its<br />
match<strong>in</strong>g <strong>of</strong> the retrieval cues specified <strong>in</strong> the production condition. Retrieval cues<br />
are feature value pairs that <strong>in</strong>crease the activation <strong>of</strong> chunks depend<strong>in</strong>g on the number <strong>of</strong><br />
matched features (associative activation). The total activation <strong>of</strong> a memory chunk<br />
calculated from the activation level and cue-<strong>based</strong> activation determ<strong>in</strong>es the probability<br />
<strong>of</strong> be<strong>in</strong>g retrieved and its retrieval latency. The possibility <strong>of</strong> several chunks match<strong>in</strong>g<br />
the retrieval cues partially enables the model to simulate associative retrieval<br />
<strong>in</strong>terference. Retrieval <strong>in</strong>terference causes distribution <strong>of</strong> associative activation onto<br />
several lexical entries, caus<strong>in</strong>g latencies and potentially the retrieval <strong>of</strong> the wrong chunk.<br />
How severely <strong>in</strong>terference affects retrieval depends on the beforementioned activation<br />
level. Activation is a fluctuat<strong>in</strong>g value which is a function <strong>of</strong> usage and decay over time.<br />
Cue-<strong>based</strong> activation and retrieval <strong>of</strong> a particular element cause its reactivation that<br />
slows down the decay process. The pars<strong>in</strong>g process <strong>of</strong> the ACT-R sentence process<strong>in</strong>g<br />
model (Lewis and Vasishth, 2005) is a comb<strong>in</strong>ation <strong>of</strong> a left corner <strong>in</strong>cremental structure<br />
build<strong>in</strong>g mechanism and a top-down goal-guided syntactic expectation that specifies the<br />
phrasal category <strong>of</strong> the structure to be constructed. A very unconventional assumption<br />
<strong>of</strong> the model regard<strong>in</strong>g the pars<strong>in</strong>g process is that, <strong>in</strong> spite <strong>of</strong> an <strong>in</strong>cremental pars<strong>in</strong>g<br />
process, the memory representation does not conta<strong>in</strong> serial order <strong>in</strong>formation that could<br />
guide retrieval and attachment preferences. Recency is only implicitly accounted for<br />
by the decay function that affects pars<strong>in</strong>g decisions <strong>in</strong> addition to cue-match<strong>in</strong>g. What<br />
differentiates this model from CC-READER and DLT is the account for <strong>in</strong>terference<br />
effects and a temporal decay function. Furthermore, process<strong>in</strong>g difficulty is not represented<br />
by process<strong>in</strong>g cycles but directly by estimated process<strong>in</strong>g time. Sett<strong>in</strong>g retrieval<br />
cues, structural attachment, and shift<strong>in</strong>g attention to the next word have fixed time<br />
10
1.3 Psychol<strong>in</strong>guistic Aspects<br />
values that <strong>in</strong> comb<strong>in</strong>ation with the activation- and <strong>in</strong>terference-<strong>based</strong> retrieval latency<br />
constitute the predicted read<strong>in</strong>g time for a word. Thus, while CC-READER and DLT<br />
make predictions <strong>based</strong> on resource management, the predictions by the ACT-R model<br />
are <strong>based</strong> on language-<strong>in</strong>dependent psychologically motivated latencies. The ACT-R<br />
sentence process<strong>in</strong>g model has shown considerable consistency with empirical data regard<strong>in</strong>g<br />
ambiguity resolution, reanalysis, center-embedd<strong>in</strong>g complexity, and extraction<br />
preferences <strong>in</strong> relative clauses (Lewis and Vasishth, 2005). Additionally, the cue-<strong>based</strong><br />
reactivation mechanism accounts for antilocality effects <strong>in</strong> certa<strong>in</strong> contexts. Additional<br />
material conta<strong>in</strong><strong>in</strong>g pronom<strong>in</strong>als or other expressions referr<strong>in</strong>g to the previously mentioned<br />
dependent can reactivate the chunk represent<strong>in</strong>g the dependent and thus boost<br />
its activation, mak<strong>in</strong>g it faster retrievable on the head.<br />
1.3.2 Expectation<br />
A very different approach from serial memory-<strong>based</strong> resource management theories is<br />
to ascribe read<strong>in</strong>g time effects to the context-dependent plausibility <strong>of</strong> the evolv<strong>in</strong>g<br />
structure. For example, predictability <strong>of</strong> a certa<strong>in</strong> word <strong>in</strong> a given context, empirically<br />
quantified by the Cloze completion task (Taylor, 1953), has a considerable effect on<br />
eyetrack<strong>in</strong>g and ERP measures <strong>in</strong> sentence process<strong>in</strong>g (Ehrlich and Rayner, 1981). A<br />
theoretically related measure is surprisal (Hale, 2001). Surprisal as used <strong>in</strong> Levy (2008)<br />
is a probabilistic grammar-<strong>based</strong> approximation to the negative log-Cloze probability<br />
but <strong>in</strong> fact results <strong>in</strong> better data fit due to its logarithmic scal<strong>in</strong>g <strong>of</strong> effects. Levy (2008)<br />
proposes a theory <strong>of</strong> a probabilistic ambiguity resolution by parallel plausibility rank<strong>in</strong>g<br />
<strong>of</strong> possible structures. For a partial str<strong>in</strong>g all complete structures that <strong>in</strong>clude the <strong>in</strong>put<br />
seen so far as a prefix are considered. At every word wi a probability distribution P i T<br />
is assigned over all possible cont<strong>in</strong>uations T , rank<strong>in</strong>g most probable structures highest.<br />
Follow<strong>in</strong>g Levy, the probability distribution can most straight-forwardly be <strong>based</strong> on<br />
a probabilistic grammar extracted from annotated corpora; but he does not make any<br />
obligatory commitments with respect to that because the source <strong>of</strong> the probability /<br />
plausibility distribution can theoretically be <strong>of</strong> semantical or phonological nature or the<br />
like. S<strong>in</strong>ce <strong>in</strong> <strong>in</strong>cremental pars<strong>in</strong>g the predictions change over time, the distribution P i T<br />
has to be updated with every new <strong>in</strong>put word. A possible re-rank<strong>in</strong>g <strong>of</strong> the preferences<br />
dur<strong>in</strong>g the update process due to an unexpected word is regarded by Levy as a k<strong>in</strong>d<br />
<strong>of</strong> reanalysis, <strong>in</strong>duc<strong>in</strong>g difficulty. In this sense the concept <strong>of</strong> difficulty prediction is<br />
equal to surprisal. However, Levy def<strong>in</strong>es difficulty by the relative entropy between the<br />
two probability distributions before and after the update. Consequently, the more the<br />
re-ranked distribution differs from the orig<strong>in</strong>al the higher is the process<strong>in</strong>g cost. This is<br />
related to reanalysis as <strong>in</strong> Frazier et al. (1983) and other approaches, with the difference<br />
that the expectation-<strong>based</strong> theory by Levy (2008) is not serial but assumes a parallel<br />
ma<strong>in</strong>tenance <strong>of</strong> all possible (or at least the most probable) structures.<br />
In contrast to connectionist prediction models, <strong>in</strong> which the performance is dependent<br />
upon how well the network has extracted probabilistic constra<strong>in</strong>ts froms the <strong>in</strong>put mate-<br />
11
Chapter 1 Prelim<strong>in</strong>aries<br />
rial, <strong>in</strong> Levy’s approach the structural probabilities <strong>of</strong> the grammar are perfectly known<br />
to the parser. Consequently, the parallel probabilistic resource allocation theory by Levy<br />
constitutes as sort <strong>of</strong> competence model with surprisal or relative entropy as a “bottleneck”<br />
to comprehension, thus yield<strong>in</strong>g performance-related predictions. Predictions <strong>of</strong><br />
frequency-<strong>based</strong> approaches like the tun<strong>in</strong>g hypothesis (Mitchell, Cuetos, Corley, and<br />
Brysbaert, 1995) are quite similar to surprisal most <strong>of</strong> the time but differ fundamentally<br />
<strong>in</strong> head-f<strong>in</strong>al structures. Similarly DLT and surprisal make comparable predictions only<br />
<strong>in</strong> structures that are not head-f<strong>in</strong>al. In head-f<strong>in</strong>al constructions the preced<strong>in</strong>g dependents<br />
provide statistical <strong>in</strong>formation about the nature <strong>of</strong> the head, thus narrow<strong>in</strong>g the<br />
prediction. Follow<strong>in</strong>g the theory, a better prediction (or lower surprisal) facilitates <strong>in</strong>tegration<br />
on the head. Thus an expectation-<strong>based</strong> theory predicts language-<strong>in</strong>dependent<br />
anti-locality effects <strong>in</strong> head-f<strong>in</strong>al structures.<br />
1.3.3 Canonicity<br />
In literature the term canonicity with respect to word order is <strong>of</strong>ten used as synonymous<br />
to regularity and structural frequency. Here these terms shall be dist<strong>in</strong>guished <strong>in</strong> order<br />
to clearly formulate respective theories.<br />
A theory <strong>of</strong> canonicity has to answer two questions:<br />
1. What categorial doma<strong>in</strong> is the focus <strong>of</strong> the canonicity?<br />
2. What makes specific structures canonical?<br />
The categorial focus <strong>of</strong> canonicity can be grammatical functions, thematic roles, letter<br />
sequences, prosody and the like. The specific structures count<strong>in</strong>g as canonical <strong>in</strong> these<br />
doma<strong>in</strong>s can be chosen by structural regularity, complexity, or simply by convention.<br />
The most common canonicity account goes back to Greenberg (1963) and relates to<br />
the basic grammatical functions subject, object, and predicate and is justified by structural<br />
regularities. Greenberg classified languages <strong>in</strong> terms <strong>of</strong> their canonical word order.<br />
He and subsequent literature count English as a subject-verb-object (SVO) language because<br />
simple sentences and most subord<strong>in</strong>ate constructions follow that order. Therewith<br />
it belongs to the second biggest class (41.79%) preceded by the SOV order attributed<br />
to 44.78% (accord<strong>in</strong>g to Toml<strong>in</strong>, 1986) <strong>of</strong> the languages <strong>of</strong> the world. However, the<br />
classification is not as clear for all languages. German is arguably an SOV language<br />
although the simplest sentence structure is built with an SVO order like English. For<br />
example, Erdmann (1990) concludes that German does not fulfill all requirements for<br />
an SOV language and should therefore be categorized as SVO.<br />
As mentioned above, structural regularity <strong>based</strong> on corpus occurences is not the only<br />
possibility to ground a canonicity account on. A generative grammar-<strong>based</strong> account<br />
that relates word order canonicity to language process<strong>in</strong>g assumes the language-specific<br />
canonical structure as an <strong>in</strong>ternal representation underly<strong>in</strong>g the surface structure (L<strong>in</strong><br />
et al., 2005). Thus, <strong>in</strong> order to comprehend a non-canonically structured sentence the<br />
12
1.3 Psychol<strong>in</strong>guistic Aspects<br />
parser has to re-transform it <strong>in</strong>to the canonical order. This extra process<strong>in</strong>g makes<br />
non-canonical structures harder to comprehend than sentences mirror<strong>in</strong>g the underly<strong>in</strong>g<br />
order. Supported by evidence from Ferreira (2003), this theory extends to thematic roles,<br />
which seem to be assigned by us<strong>in</strong>g heuristics <strong>based</strong> on a canonical argument structure.<br />
A fully heuristic language process<strong>in</strong>g account has been proposed by Bever (1970). He<br />
def<strong>in</strong>ed several comprehension strategies that relied on superficial structural similarities.<br />
One <strong>of</strong> them (Strategy D) was related to thematic role assignment:<br />
Strategy D “Any Noun-Verb-Noun (NVN) sequence with<strong>in</strong> a potential <strong>in</strong>ternal unit <strong>in</strong><br />
the surface structure corresponds to ‘actor-action-object’.” (Bever, 1970)<br />
Bever’s strategies cover several categorial levels <strong>of</strong> structural regularities reach<strong>in</strong>g from<br />
phonemes to complex phrases and clauses. These strategies def<strong>in</strong>e specific structures that<br />
seem to appear <strong>in</strong> a regular fashion at different levels as basic or canonical structures<br />
that are predicted or expected by default. Concern<strong>in</strong>g process<strong>in</strong>g difficulty, this account<br />
would predict harder process<strong>in</strong>g for structures that do not fit <strong>in</strong>to the strategy templates.<br />
Canonicity accounts are closely related to structural frequency. Although it is not obligatory,<br />
most accounts that use heuristics base their choice <strong>of</strong> canonical structures on the<br />
frequencies <strong>of</strong> those <strong>in</strong> language usage. The orig<strong>in</strong>al language classification by Greenberg<br />
<strong>in</strong>to SOV, SVO, etc., was clearly attributed to <strong>of</strong>ten occurr<strong>in</strong>g thematic order<strong>in</strong>gs.<br />
On the other hand, a theory assum<strong>in</strong>g base-generated order<strong>in</strong>gs <strong>in</strong> deep structure <strong>of</strong> a<br />
generative grammar can, <strong>of</strong> course, argue completely <strong>in</strong>dependent <strong>of</strong> frequencies and, for<br />
<strong>in</strong>stance, refer to universal grammar specifications. Of course, Bever’s strategic preferences,<br />
although apparently correlat<strong>in</strong>g with frequent structures, could also be claimed<br />
to stem from <strong>in</strong>nate universal pr<strong>in</strong>ciples that help children learn a language. Detailed<br />
predictions <strong>of</strong> a word order canonicity account will be discussed <strong>in</strong> 2.3.<br />
1.3.4 <strong>Experience</strong><br />
<strong>Experience</strong>-<strong>based</strong> theories assume that pars<strong>in</strong>g strategies and process<strong>in</strong>g efficiency are<br />
shaped by exposure to language.<br />
Structural Frequency<br />
Literature shows that corpus frequencies can be a good predictor for comprehension difficulty.<br />
A consequent assumption is that structures that are used more <strong>of</strong>ten should be<br />
easier to comprehend than structures that are rarely produced. This assumption proposes<br />
a parallelism between language production and comprehension. There are roughly<br />
two possibilities to expla<strong>in</strong> that parallelism. One explanation states a casual relation<br />
between production and comprehension, mean<strong>in</strong>g that exposure to special structures<br />
shapes the comprehension ability <strong>of</strong> those. The other explanation assumes that the underly<strong>in</strong>g<br />
processes <strong>of</strong> production and comprehension are basically the same and hence<br />
13
Chapter 1 Prelim<strong>in</strong>aries<br />
are limited by the same constra<strong>in</strong>ts. An experience-<strong>based</strong> account clearly favors the<br />
former explanation, which, <strong>of</strong> course, does not exclude the second possibility but does<br />
not depend on it.<br />
The Structural Gra<strong>in</strong> Size<br />
A serious problem for the formulation <strong>of</strong> symbolic theories <strong>of</strong> handl<strong>in</strong>g exposure-<strong>based</strong><br />
pars<strong>in</strong>g decisions is the question <strong>of</strong> gra<strong>in</strong> size. As <strong>in</strong> canonicity accounts the question to<br />
answer is on which structural level <strong>in</strong>formation should be considered as affect<strong>in</strong>g pars<strong>in</strong>g<br />
decisions. A symbolic exposure-<strong>based</strong> account like the one by Mitchell, Cuetos, Corley,<br />
and Brysbaert (1995) tabulates the frequencies <strong>of</strong> specific structures. For each relevant<br />
structure there is a table list<strong>in</strong>g its different <strong>in</strong>terpretations (e.g. attachment site<br />
<strong>in</strong> complex noun phrases). When the parser processes an ambiguous construction the<br />
most frequent <strong>of</strong> the relevant recorded structures (frames or partial syntactic representations)<br />
is merged with the current sentence structure to yield a predicted disambiguated<br />
structure.<br />
“The success <strong>of</strong> this process depends upon establish<strong>in</strong>g a useful l<strong>in</strong>k between<br />
aspects <strong>of</strong> the current material and correspond<strong>in</strong>g features <strong>of</strong> the established<br />
records. This is essentially a category selection or pattern-match<strong>in</strong>g<br />
problem.” (p. 470).<br />
The recorded structures can be specified <strong>in</strong> deep detail, say, on a lexical level, or rather<br />
abstracted, e.g., on the level <strong>of</strong> phrasal categories. Example (5) (Mitchell et al., 1995)<br />
conta<strong>in</strong>s a global ambiguity. In (5a) the RC who was outside the house can be attached<br />
to either the first noun wife or the second noun football star. The same is true for the<br />
PP outside the house <strong>in</strong> (5b).<br />
(5) a. Someone stabbed the wife <strong>of</strong> the football star who was outside the house.<br />
b. Someone stabbed the estranged wife <strong>of</strong> the movie star outside the house.<br />
In an exposure-<strong>based</strong> account the parser’s decision about noun one (high) or noun two<br />
(low) attachment depends on the corpus frequencies <strong>of</strong> both possibilities. These frequencies<br />
could be calculated on several possible structural levels. For example frequencies<br />
could be tabulated <strong>in</strong>dividually for each construction by tabulat<strong>in</strong>g attachment preferences<br />
for NP-PP-RC structures as well as for NP-PP-PP structures. Also, the preferences<br />
could be tabulated for both constructions pooled together by record<strong>in</strong>g the occurrences <strong>of</strong><br />
the more abstracted NP-PP-(modify<strong>in</strong>g constituent) structure. The choice <strong>of</strong> gra<strong>in</strong> size<br />
crucially affects the theory’s predictions. A too f<strong>in</strong>e gra<strong>in</strong>ed record level is <strong>in</strong> danger <strong>of</strong><br />
miss<strong>in</strong>g out some affected constructions. A very abstract level on the other hand can lead<br />
to overgeneralization. Mitchel, Cuetos et al. categorize exist<strong>in</strong>g exposure-<strong>based</strong> models<br />
<strong>in</strong>to a) f<strong>in</strong>e-gra<strong>in</strong>ed (Spivey-Knowlton, 1994), b) coarse-gra<strong>in</strong>ed (Cuetos et al.,<br />
1996), and c) mixed-gra<strong>in</strong> models. <strong>Connectionist</strong> network models like MacDonald<br />
14
1.3 Psychol<strong>in</strong>guistic Aspects<br />
et al. (1994) and Juliano and Tanenhaus (1994) are counted <strong>in</strong>to the third category. Also<br />
the account <strong>of</strong> Bever (1970) is a mixed-gra<strong>in</strong> account. Mitchell et al. basically argue for<br />
a coarse gra<strong>in</strong>ed approach. They present empirical evidence contra sub-classification <strong>of</strong><br />
structures depend<strong>in</strong>g upon noun types or animacy or the like. For example <strong>in</strong> French<br />
the statistics <strong>of</strong> all NP-PP-RC structures make the correct noun phrase attachment<br />
preference predictions (high preference). Includ<strong>in</strong>g statistical <strong>in</strong>formation about def<strong>in</strong>iteness<br />
and other aspects <strong>of</strong> noun phrases leads to the wrong predictions. Obviously,<br />
f<strong>in</strong>e-gra<strong>in</strong>ed <strong>in</strong>formation has to be ignored <strong>in</strong> that case. In a sentence completion study<br />
Corley and Corley (1995) showed evidence that noun phrase attachment preferences <strong>in</strong><br />
English do not rely on lexical data. They analyzed the by-subject variance <strong>of</strong> two studies<br />
<strong>in</strong>volv<strong>in</strong>g the same structures but different lexical items. The lexical alternation did not<br />
affect the attachment preference (low). Interest<strong>in</strong>gly, for noun phrase attachments with<br />
two potential attachment sites there is a high-attachment preference <strong>in</strong> most languages.<br />
Exceptions are English, German, Italian, and Swedish where low attachment is generally<br />
preferred. Mitchell et al. (1995) argue that such general preferences are only expla<strong>in</strong>able<br />
<strong>in</strong> coarse-gra<strong>in</strong>ed models. Cuetos et al. (1996) report that corpus frequencies predict<br />
the attachment preferences <strong>of</strong> two-NP-site ambiguities 1 for Spanish and English. For<br />
Spanish, which shows a high-attachment preference, Cuetos and colleagues found that<br />
60% <strong>of</strong> the NP-PP-RC constructions <strong>in</strong> the corpus had the RC attached to the NP.<br />
For English (low-attachment preference) they only found 38% high attachment <strong>in</strong> the<br />
corpus. Desmet and Gibson (2003) argue aga<strong>in</strong>st the commitment <strong>of</strong> a model to one<br />
gra<strong>in</strong> size level by show<strong>in</strong>g that <strong>in</strong> some cases NP attachment preferences are affected<br />
on the lexical level. Desmet and Gibson studied human preferences <strong>in</strong> eyetrack<strong>in</strong>g and<br />
corpus frequencies <strong>of</strong> three-NP-site noun conjunction ambiguities (see example 6). They<br />
found that corpus frequencies support the empirical results, which show a preference <strong>of</strong><br />
middle over high attachment. But replac<strong>in</strong>g the noun <strong>in</strong>side the attached phrase with<br />
the pronoun one turned the preference <strong>in</strong>to high > middle rather than middle > high<br />
(Gibson and Schütze, 1999).<br />
(6) A column about a soccer team from the suburbs and. . .<br />
a. an article/one about a baseball team from the city were published <strong>in</strong> the<br />
Sunday edition. (high)<br />
b. a baseball team/one from the city was published <strong>in</strong> the Sunday edition.<br />
(middle)<br />
A similar effect was ga<strong>in</strong>ed for German two-NP-site ambiguities (Hemforth et al.,<br />
2000): The ambiguity conta<strong>in</strong><strong>in</strong>g an anaphoric b<strong>in</strong>d<strong>in</strong>g (e.g. a relative clause) produces<br />
a high attachment preference, while without the anaphoric b<strong>in</strong>d<strong>in</strong>g it results <strong>in</strong> a low<br />
attachment preference. This leads Desmet and Gibson (2003) to conclude that <strong>in</strong> addition<br />
to structural <strong>in</strong>formation the occurrences <strong>of</strong> pronouns have to be tabulated as a<br />
1 Two-NP-site ambiguities refer to constructions where there are two preced<strong>in</strong>g NPs to potentially<br />
attach to.<br />
15
Chapter 1 Prelim<strong>in</strong>aries<br />
predictor. This shows – as Mitchell et al. (1995) also admit – that the exposure-<strong>based</strong><br />
approach has to f<strong>in</strong>d a balance between coarse and more f<strong>in</strong>e gra<strong>in</strong>ed measures and that<br />
different structures might require different gra<strong>in</strong> sizes for the tabulation <strong>of</strong> frequencies.<br />
Structural Frequency <strong>in</strong> a <strong>Connectionist</strong> Network<br />
A type <strong>of</strong> models that specifically base their predictions on the records <strong>of</strong> structural<br />
frequencies are connectionist network models. Mitchell et al. (1995) say that “<strong>in</strong> a<br />
connectionist system the records would consist <strong>of</strong> a set <strong>of</strong> activation weights distributed<br />
throughout the network.” (p. 472). This is partly true. The network does not explicitly<br />
count frequencies, nor are frequencies stored anywhere <strong>in</strong> the network. It is rather that<br />
every exposure <strong>of</strong> the network to a specific structure immediately changes the weight<br />
distributions and thus the whole behavior <strong>of</strong> the network. So, one could say that the<br />
weight distributions conta<strong>in</strong> implicit structural knowledge. This can be observed, for<br />
example, <strong>in</strong> the activations <strong>in</strong> hidden layers. Inputs <strong>of</strong> similar structures result <strong>in</strong> similar<br />
activation patterns <strong>in</strong> the hidden layer <strong>of</strong> simple recurrent networks (SRN, Elman, 1990).<br />
The comparison <strong>of</strong> these patterns reveals the structural generalization levels that drive<br />
the networks predictions. Different from symbolic exposure-<strong>based</strong> accounts there is no<br />
explicitly fixed structural gra<strong>in</strong> size the network is sensitive to. There is <strong>of</strong> course a<br />
limit to the lowest gra<strong>in</strong> size, which is def<strong>in</strong>ed by the encod<strong>in</strong>g level <strong>of</strong> the <strong>in</strong>put. If<br />
the <strong>in</strong>put str<strong>in</strong>g is encoded on word level, the network has no <strong>in</strong>formation below that<br />
level to work with. The upper limit depends on the networks architecture and can be<br />
affected by the size <strong>of</strong> the hidden layer, the learn<strong>in</strong>g mechanism, and specifically for<br />
recurrent networks by their “memory span”. What levels the network actually choses<br />
is hard to say <strong>in</strong> advance. Learn<strong>in</strong>g is a walk through a state space try<strong>in</strong>g to f<strong>in</strong>d the<br />
optimal solution to the desired <strong>in</strong>put-output pair<strong>in</strong>g. The choice <strong>of</strong> gra<strong>in</strong> size is part<br />
<strong>of</strong> that optimization process and can change dur<strong>in</strong>g the learn<strong>in</strong>g phase. A commitment<br />
to a specific gra<strong>in</strong> size implicitly <strong>in</strong>volves a commitment to the number <strong>of</strong> structures to<br />
dist<strong>in</strong>guish. A f<strong>in</strong>e-gra<strong>in</strong>ed model will consequently have to keep apart lots <strong>of</strong> structural<br />
representations, while a very coarse-gra<strong>in</strong>ed model has only few structures to deal with.<br />
This relation means that a network with very few hidden nodes where the <strong>in</strong>formation<br />
has to be passed through will only be able to do very high level generalizations. The<br />
f<strong>in</strong>al choice <strong>of</strong> gra<strong>in</strong> size will ideally be the most useful structural level <strong>of</strong> the <strong>in</strong>ternal<br />
<strong>in</strong>put representation to meet the output requirements given the networks architecture.<br />
In chapter 3 the properties <strong>of</strong> connectionist networks will be discussed <strong>in</strong> more detail.<br />
Frequency and Regularity<br />
Structural regularity is the occurrence <strong>of</strong> similarities between different structures on a<br />
certa<strong>in</strong> gra<strong>in</strong> level. For example, the English SRC is more regular than the ORC because<br />
on the level <strong>of</strong> functional categories (SVO) the SRC is similar to many other structures.<br />
In contrast, the corpus frequency <strong>of</strong> OSV as <strong>in</strong> the ORC is very low. In that sense,<br />
16
1.3 Psychol<strong>in</strong>guistic Aspects<br />
regularity is just noth<strong>in</strong>g else than frequency. One can speak <strong>of</strong> a regularity effect when,<br />
for <strong>in</strong>stance, structures or tokens that are not highly-frequent themselves receive a sort <strong>of</strong><br />
“neighbor benefit” from frequent structures that are similar on a certa<strong>in</strong> level. Benefit is<br />
meant <strong>in</strong> the context <strong>of</strong> a facilitat<strong>in</strong>g frequency effect. For example, <strong>in</strong> word-recognition<br />
regular words (with respect to orthography-pronounciation correspondence) are easier<br />
to recognize than exceptional words, although a regular word is not necessarily more<br />
frequent. It just shares sub-regulatities (i.e. similarities on a lower level) with others,<br />
which has a facilitat<strong>in</strong>g effect. S<strong>in</strong>ce irregular words do not have a “neighbor benefit”, this<br />
leads to the frequency × regularity <strong>in</strong>teraction, which is implemented, for example, <strong>in</strong><br />
Seidenberg and McClelland (1989). The <strong>in</strong>teraction characterizes the fact that there is a<br />
recognition performance difference between high- and low-frequent irregular words, while<br />
this difference is not present for regular words. Seidenberg and McClelland’s model is a<br />
connectionist architecture, which predicts that <strong>in</strong>teraction due to its learn<strong>in</strong>g mechanism.<br />
It is the same <strong>in</strong>teraction that the SRN <strong>in</strong> MacDonald and Christiansen (2002) predicts<br />
for English subject and object relative clauses, regard<strong>in</strong>g ORCs as irregular.<br />
17
Chapter 2<br />
Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
2.1 The Subject/Object Difference<br />
One <strong>of</strong> the issues that shall be addressed <strong>in</strong> this work is the process<strong>in</strong>g difference between<br />
subject and object relatives. The subject/object difference is phenomenon excessively<br />
discussed <strong>in</strong> the literature. Studies <strong>in</strong> many languages show that subject relatives are<br />
easier to comprehend than object relatives (see table 2.1 for an overview). The studies<br />
are cross-l<strong>in</strong>guistically consistent enough to speak <strong>of</strong> a universal subject preference.<br />
An exemplary study <strong>of</strong> English RCs that will be <strong>of</strong> further relevance for the work at<br />
hand is K<strong>in</strong>g and Just (1991). For that reason I will briefly describe their experiment:<br />
K<strong>in</strong>g and Just (1991) K<strong>in</strong>g and Just conducted a self-paced read<strong>in</strong>g 1 study <strong>of</strong> English<br />
RCs like <strong>in</strong> example (1), repeated here as (7). Before the experiment the participants<br />
were grouped by their read<strong>in</strong>g span value obta<strong>in</strong>ed by a read<strong>in</strong>g span test<br />
(Daneman and Carpenter, 1980). The span value is assumed to be associated with the<br />
<strong>in</strong>dividual memory capacity. The test value was used to group participants <strong>in</strong>to high-,<br />
mid-, and low-span readers. Read<strong>in</strong>g time analysis yielded the follow<strong>in</strong>g results: there<br />
was a) a global memory span effect show<strong>in</strong>g <strong>in</strong>creased read<strong>in</strong>g times for participants<br />
with lower span value, b) the memory span effect was <strong>in</strong>creased on the ORC, c) regions<br />
<strong>of</strong> greatest difficulty were the embedded verb (attacked) and the ma<strong>in</strong> verb (admitted),<br />
and d) the ORC was globally read slower than the SRC. The results showed that readers<br />
spent more time on the embedded and the ma<strong>in</strong> verb <strong>in</strong> object relative clauses compared<br />
to subject relative clauses. Additional comprehension questions yielded a significantly<br />
lower accuracy for low-spans compared to high-spans, show<strong>in</strong>g that not only process<strong>in</strong>g<br />
was slower but also comprehension was worse for participants with lower span value.<br />
Note that the high extraction type difference on the ma<strong>in</strong> verb may be a spillover effect.<br />
Grodner and Gibson (2005) carried out a study that used stimuli with <strong>in</strong>terven<strong>in</strong>g<br />
material between the verbs to prevent spillover. This study showed that there is <strong>in</strong>deed<br />
1 Self-paced read<strong>in</strong>g (Just et al., 1982) is a method to record word-by-word read<strong>in</strong>g time <strong>in</strong> onl<strong>in</strong>e<br />
sentence comprehension. Participants read a sentences word-by-word press<strong>in</strong>g a button to make the<br />
next word appear. Only the current word is shown. The rest <strong>of</strong> the sentence may optionally be<br />
represented by mask<strong>in</strong>g characters.<br />
18
2.1 The Subject/Object Difference<br />
130 MARCEL ADAM JUST AND PATRICIA A. CARPENTER<br />
I<br />
Q<br />
O<br />
W<br />
S<br />
H<br />
O 25<br />
S<br />
H<br />
05<br />
900<br />
800<br />
700<br />
600<br />
500<br />
Mid<br />
High<br />
Subject Relative<br />
Low<br />
I I I<br />
[The] reporter senator admitted the<br />
that<br />
attacked<br />
the<br />
error,<br />
Object Relative<br />
J_ I I I<br />
[The] reporter attacked admitted the<br />
that error,<br />
the<br />
senator<br />
Figure 2. Read<strong>in</strong>g time per word for successive areas <strong>of</strong> subject- and object-relative sentences, for high,<br />
medium (Mid), and low span subjects. (The differences among the span groups are larger for the more<br />
difficult object-relative construction, which is the more complex sentence. The differences are particularly<br />
large at the verbs, which are po<strong>in</strong>ts <strong>of</strong> process<strong>in</strong>g difficulty that are expected to stress work<strong>in</strong>g<br />
memory capacity. The read<strong>in</strong>g times for parenthesized words are not <strong>in</strong>cluded <strong>in</strong> the plotted po<strong>in</strong>ts.)<br />
Figure 2.1: Read<strong>in</strong>g times for English relative clauses by read<strong>in</strong>g span value (low,<br />
middle and high) (K<strong>in</strong>g and Just, 1991).<br />
lated decl<strong>in</strong>e no <strong>in</strong> read<strong>in</strong>g the ability time to imitate difference sentences on is largest the ma<strong>in</strong> <strong>in</strong> casesverbbiguities.<br />
causedA by comprehender extractionencounter<strong>in</strong>g type. Also, an ambiguity there was might<br />
<strong>in</strong> which an the extraction process<strong>in</strong>g <strong>of</strong> the type ma<strong>in</strong> × syntactic verb (emb/matrix) constituent is <strong>in</strong>teraction. select a s<strong>in</strong>gle <strong>in</strong>terpretation In particular, (Frazier, the1978; read<strong>in</strong>g Just & times Carpenter,<br />
<strong>in</strong>terrupted onby the embedded process<strong>in</strong>g <strong>of</strong> a and long the embedded ma<strong>in</strong>constituent. verb yielded1987; theMarcus, pattern 1980), V or she or he might reta<strong>in</strong> two alternative<br />
emb. < V ma<strong>in</strong> <strong>in</strong> the SRC and<br />
This type <strong>of</strong> construction requires that the <strong>in</strong>itial portion <strong>of</strong> the <strong>in</strong>terpretations until some later disambiguat<strong>in</strong>g <strong>in</strong>formation is<br />
ma<strong>in</strong> constituent the opposite be reta<strong>in</strong>ed (V ma<strong>in</strong> <strong>in</strong> work<strong>in</strong>g < V emb) memory <strong>in</strong> the while ORC. the provided (Gorrell, 1987; Kurtzman, 1985). These two schemes<br />
embedded constituent is processed under the memory load, for deal<strong>in</strong>g with syntactic ambiguity have been posed as oppos-<br />
and then the (7) stored English portion SRC must be and made ORC accessible (K<strong>in</strong>g aga<strong>in</strong> and Just, <strong>in</strong>g (and 1991): mutually exclusive) alternatives. However, <strong>in</strong> a series <strong>of</strong><br />
when its f<strong>in</strong>al portion is be<strong>in</strong>g processed. In addition to this experiments, we found that both positions could be reconciled<br />
a. The reporter thati ei attacked the senator admitted the error. (SRC)<br />
age-related difference <strong>in</strong> imitation performance, Kemper by postulat<strong>in</strong>g <strong>in</strong>dividual differences <strong>in</strong> the degree to which<br />
found a correspond<strong>in</strong>g b. age-related The reporter difference thati <strong>in</strong> spontaneous the senatormultiple attacked representations ei admitted are ma<strong>in</strong>ta<strong>in</strong>ed the error. for a (ORC) syntactic ambigu-<br />
production (Kemper, 1988; Kynette & Kemper, 1986).<br />
ity (MacDonald, Just, & Carpenter, <strong>in</strong> press).<br />
Thus, the (8) decl<strong>in</strong>e German <strong>in</strong> language SRC performance and ORC <strong>in</strong> the (Konieczny elderly is and In the Ruh, model 2003): we advance, multiple representations are <strong>in</strong>i-<br />
focused on sentences a. whose Dersyntax<br />
Wärter, makes deri<br />
large demands ei denon<br />
Häftl<strong>in</strong>g tially constructed beleidigte, by all entdeckte comprehenders denon<br />
Tunnel. first encounter<strong>in</strong>g<br />
work<strong>in</strong>g memory. In general, the <strong>in</strong>dividual operations <strong>of</strong> lan- the syntactic ambiguity. Each <strong>of</strong> the multiple representations is<br />
guage process<strong>in</strong>g show little The evidence guard, <strong>of</strong> decl<strong>in</strong>e whonom with age theacc when prisoner assumed <strong>in</strong>sulted, to have an activation discovered level the proportional tunnel. to its fre-<br />
the total process<strong>in</strong>g load ‘The is small. guard However, whoat <strong>in</strong>sulted times <strong>of</strong> high the prisoner quency, its discovered syntactic complexity, the tunnel.’ and its (SRC) pragmatic plausibility.<br />
demand, the total performance does decl<strong>in</strong>e, <strong>in</strong>dicat<strong>in</strong>g an age- The important new postulate <strong>of</strong> our theory is that the work<strong>in</strong>g<br />
related decrease <strong>in</strong> the b. overall Der Wärter, work<strong>in</strong>g memory deni capacity der for Häftl<strong>in</strong>g memory ei capacity beleidigte, <strong>of</strong> the comprehender entdeckte <strong>in</strong>fluences den Tunnel. the duration<br />
language. The guard, whoacc thenom prisoner (i.e, <strong>in</strong>terven<strong>in</strong>g <strong>in</strong>sulted, text) over discovered<br />
which multiple the syntactic tunnel. representations<br />
can be ma<strong>in</strong>ta<strong>in</strong>ed. A low span reader does not have suffi-<br />
‘The guard who the prisoner <strong>in</strong>sulted<br />
Syntactic Ambiguity: S<strong>in</strong>gle Versus Multiple<br />
cient capacity discovered to ma<strong>in</strong>ta<strong>in</strong> the the tunnel.’ two <strong>in</strong>terpretations, (ORC) and soon<br />
abandons the less preferred <strong>in</strong>terpretation, which results <strong>in</strong> a<br />
Representations<br />
Similar results concern<strong>in</strong>g the subject/object s<strong>in</strong>gle-<strong>in</strong>terpretation differencescheme. were obta<strong>in</strong>ed In contrast, a <strong>in</strong>high German, span reader<br />
Another French, facet <strong>of</strong> language H<strong>in</strong>di, Japanese, that could generate Korean demand and for others,<br />
will<br />
<strong>in</strong>volv<strong>in</strong>g<br />
be able to ma<strong>in</strong>ta<strong>in</strong><br />
different<br />
two<br />
sorts<br />
<strong>in</strong>terpretations<br />
<strong>of</strong> paradigms<br />
for some<br />
like<br />
period.<br />
additional resources is syntactic ambiguity, particularly <strong>in</strong> the The full set <strong>of</strong> results is too long to present here, because it<br />
absence <strong>of</strong> eye-track<strong>in</strong>g, a preced<strong>in</strong>g context self-paced that selects read<strong>in</strong>g, among the and possible bra<strong>in</strong> imag<strong>in</strong>g <strong>in</strong>cludes read<strong>in</strong>g technics times (see and table comprehension 2.1 for references). rates on unam-<br />
<strong>in</strong>terpretations. Explanations If comprehenders for the were process<strong>in</strong>g to represent more differences than biguous between sentences the two and two types resolutions <strong>of</strong> RCs <strong>of</strong> ambiguous must ideally sentences<br />
one <strong>in</strong>terpretation <strong>of</strong> an ambiguity dur<strong>in</strong>g the portion <strong>of</strong> a sen- (MacDonakl, Just, & Carpenter, <strong>in</strong> press, for details). However,<br />
tence that is ambiguous, this would clearly demand additional we can present the critical data that support the central claim,<br />
capacity. However, the exist<strong>in</strong>g data and the correspond<strong>in</strong>g the- which makes an un<strong>in</strong>tuitive prediction. In the survey <strong>of</strong> capacories<br />
are <strong>in</strong> disagreement about the process<strong>in</strong>g <strong>of</strong> syntactic amity effects presented above, a greater capacity produces better<br />
19
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
Language Task References<br />
Brazilian Portuguese RSVP Gouvea (2003)<br />
Dutch SPR Frazier (1987)<br />
SPR, eye-track<strong>in</strong>g Mak et al. (2002)<br />
English Lexical Decision Ford (1983)<br />
SPR K<strong>in</strong>g and Just (1991); Gibson<br />
et al. (2005a)<br />
ERP K<strong>in</strong>g and Kutas (1995)<br />
fMRI Caplan et al. (2002); Just<br />
et al. (1996)<br />
PET Stromswold et al. (1996)<br />
French phoneme-monitor<strong>in</strong>g Frauenfelder and Segui<br />
(1980)<br />
click-monitor<strong>in</strong>g Cohen and Mehler (1996)<br />
eye-track<strong>in</strong>g Holmes and O’Regan (1981)<br />
German SPR Schriefers et al. (1995)<br />
ERP Meckl<strong>in</strong>ger et al. (1995)<br />
H<strong>in</strong>di SPR Vasishth and Lewis (2006b)<br />
Japanese SPR Miyamoto and Nakamura<br />
(2003)<br />
Korean SPR Kwon et al. (2004)<br />
Table 2.1: A selection <strong>of</strong> papers report<strong>in</strong>g a subject preference (extended table orig<strong>in</strong>ally<br />
from L<strong>in</strong> and Bever, 2006b).<br />
cover the global preference <strong>in</strong> several languages. Hence, <strong>in</strong>herent differences between<br />
both constructions have to be found that could account for the diverse process<strong>in</strong>g effects.<br />
The most reliable cross-l<strong>in</strong>guistical difference between SRCs and ORCs is word order.<br />
For example there is a greater distance between the head noun and the gap <strong>in</strong> the ORC<br />
<strong>in</strong> English and German (cf. example 7 and 8). The <strong>in</strong>volved dependencies are assumed<br />
to be particularly memory demand<strong>in</strong>g and produce locality effects (e.g. Grodner and<br />
Gibson, 2005) by <strong>in</strong>tegration difficulty. Challeng<strong>in</strong>g for a cross-l<strong>in</strong>guistic explanation is<br />
that <strong>in</strong> some languages, e.g. Korean, Japanese, and Ch<strong>in</strong>ese, RCs are prenom<strong>in</strong>al, i.e,<br />
they precede the head noun. Others like H<strong>in</strong>di use both possibilities Vasishth and Lewis<br />
(2006b). In most cases the position <strong>of</strong> the RC before or after the head noun does not<br />
seem to be an <strong>in</strong>trud<strong>in</strong>g factor as Korean and Japanese align with post-nom<strong>in</strong>al languages<br />
show<strong>in</strong>g a subject preference. Popular locality-<strong>in</strong>dependent word order explanations are<br />
canonicity and frequency: <strong>in</strong> most languages SRCs have a more canonical word order<br />
than ORCs and furthermore a higher corpus frequency. Apart from syntactic properties<br />
20
2.2 Ch<strong>in</strong>ese Relative Clauses<br />
also semantic <strong>in</strong>formation plays an important role <strong>in</strong> the subject/object difference. For<br />
example, experiments by Traxler et al. (2002) showed that animacy and verb-<strong>in</strong>duced<br />
plausibility are crucial predictors for difficulty differences between both constructions.<br />
Although the global subject preference proves to be resistant, there is at least one<br />
exception reported so far: Ch<strong>in</strong>ese Mandar<strong>in</strong>, where RCs precede the head noun, just<br />
like <strong>in</strong> Japanese and Korean. Hsiao and Gibson (2003) found <strong>in</strong> an SPR experiment that<br />
<strong>in</strong> Mandar<strong>in</strong> subject relatives are <strong>in</strong> fact harder to comprehend than object relatives.<br />
Interest<strong>in</strong>gly, henceforward the literature about Ch<strong>in</strong>ese relative clauses reports mixed<br />
results. While L<strong>in</strong> and Garnsey (2007) and Qiao and Forster (2008) confirmed Hsiao and<br />
Gibson’s results, Kuo and Vasishth (2007) and L<strong>in</strong> and Bever (2006b) found a subject<br />
preference. The apparently unsolved question about Ch<strong>in</strong>ese Mandar<strong>in</strong> might tip the<br />
scales <strong>in</strong> the search <strong>of</strong> a globally consistent theory <strong>of</strong> relative clause comprehension.<br />
Theories like Gibson’s (Gibson, 1998) Dependency Locality Theory, which favors an ORadvantage<br />
for Mandar<strong>in</strong>, or the Accessibility Hypothesis (Keenan and Comrie, 1977; L<strong>in</strong><br />
et al., 2005), which favors a global subject preference, might rise and fall as candidates<br />
for a theory globally consistent across languages. For other theories that are <strong>based</strong> on<br />
canonicity or word order frequency to make reasonable predictions further <strong>in</strong>vestigations<br />
<strong>of</strong> the Mandar<strong>in</strong> relative clause structure are necessary.<br />
The follow<strong>in</strong>g section will discuss the structure <strong>of</strong> Mandar<strong>in</strong> RCs. Then relevant<br />
theories are assessed on their predictions concern<strong>in</strong>g English and Mandar<strong>in</strong>. In the<br />
follow<strong>in</strong>g, recent studies about the Ch<strong>in</strong>ese SRC/ORC difference will be discussed and<br />
their results will be compared to the predictions <strong>of</strong> the outl<strong>in</strong>ed theories. F<strong>in</strong>ally I will<br />
turn to the second topic: language-specific forgett<strong>in</strong>g effects <strong>in</strong> center-embedd<strong>in</strong>g.<br />
2.2 Ch<strong>in</strong>ese Relative Clauses<br />
Relative clauses <strong>in</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> are head f<strong>in</strong>al, i.e., they precede the modified<br />
noun. The RC is attached to the noun with the <strong>in</strong>terven<strong>in</strong>g genitive marker de (gen),<br />
which here serves as a relativizer.<br />
(9) a. Mandar<strong>in</strong> SRC:<br />
[ei yaoq<strong>in</strong>g fuhao dei] guanyuani x<strong>in</strong>huaibugui.<br />
<strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
V O S<br />
’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon has bad <strong>in</strong>tentions.’<br />
b. Mandar<strong>in</strong> ORC:<br />
[fuhao yaoq<strong>in</strong>g ei dei] guanyuani x<strong>in</strong>huaibugui.<br />
tycoon <strong>in</strong>vite gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
S V O<br />
’The <strong>of</strong>ficial who the tycoon <strong>in</strong>vited has bad <strong>in</strong>tentions.’<br />
Subject extracted RCs (example 9a) start with the embedded verb, before which a<br />
subject gap is assumed that has to be filled with the head noun. The SRC’s surface<br />
21
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
structure is ‘V NO de NS’, where NO is the embedded object and NS the head noun<br />
serv<strong>in</strong>g as the RC subject. Object relatives (example 9b) start with the embedded<br />
subject and the object gap is assumed just before the relativizer. The general structure<br />
is NS V de NO, where the head noun (NO) serves as the RC object.<br />
The pre-nom<strong>in</strong>al nature <strong>of</strong> Ch<strong>in</strong>ese RCs has three major structural consequences that<br />
dist<strong>in</strong>guish these constructions from RCs <strong>in</strong> English and other languages and hence<br />
could lead to different theory predictions. The first difference is the position <strong>of</strong> the<br />
gap. In English the filler-gap distance is shorter <strong>in</strong> the subject relative while the headf<strong>in</strong>al<br />
nature <strong>of</strong> Ch<strong>in</strong>ese yields a shorter distance <strong>in</strong> object relatives. This and the fact<br />
that the gap precedes the filler should make a difference for memory-<strong>based</strong> accounts<br />
and gap-search<strong>in</strong>g algorithms. Secondly, the head-f<strong>in</strong>al structure produces a temporal<br />
ambiguity, especially <strong>in</strong> the Ch<strong>in</strong>ese ORC. In English the start <strong>of</strong> a non-reduced RC is<br />
marked by a relative pronoun (e.g. that). In Ch<strong>in</strong>ese, because the relativizer follows<br />
the RC, the reader is not necessarily aware <strong>of</strong> the RC while read<strong>in</strong>g it. Initially, the<br />
Ch<strong>in</strong>ese ORC has the form <strong>of</strong> a simple sentence. This should have consequences for<br />
pars<strong>in</strong>g and prediction. F<strong>in</strong>ally, the canonicity properties <strong>of</strong> object and subject RCs is<br />
swapped <strong>in</strong> Ch<strong>in</strong>ese. In contrast to English and other languages, where the SRC exhibits<br />
the canonical word order, it is the ORC <strong>in</strong> Ch<strong>in</strong>ese which resembles the SVO word<br />
order <strong>of</strong> simple sentences. Another consequence <strong>of</strong> the noun-preced<strong>in</strong>g RC concerns the<br />
complexity <strong>of</strong> deeper embedd<strong>in</strong>g. Interest<strong>in</strong>gly, <strong>in</strong> Ch<strong>in</strong>ese an SRC embedd<strong>in</strong>g produces<br />
the assumedly more complex center-embedd<strong>in</strong>g structure while ORC-embedd<strong>in</strong>g results<br />
<strong>in</strong> an iterative l<strong>in</strong>ear structure.<br />
(10) a. Doubly embedded SRC (Hsiao and Gibson, 2003):<br />
[ei yaoq<strong>in</strong>g [ej gojie faguan dej ] fuhaoj dei] guanyuani<br />
gap <strong>in</strong>vite gap conspire judge gen tycoon gen <strong>of</strong>ficial . . .<br />
V1 V2 N1 de1 N2 de2 N3<br />
’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon who conspired with the judge. . . ’<br />
b. Doubly embedded ORC (Hsiao and Gibson, 2003):<br />
[[fuhao yaoq<strong>in</strong>g ei dei] faguani gojie ej dej ] guanyuanj<br />
tycoon <strong>in</strong>vite gap gen judge conspire gap gen <strong>of</strong>ficial . . .<br />
N1 V1 de1 N2 V2 de2 N3<br />
’The <strong>of</strong>ficial who the judge who the tycoon <strong>in</strong>vited conspired with. . . ’<br />
As can be seen <strong>in</strong> the example (10), the doubly embedded SRC shows a recursive<br />
center-embedd<strong>in</strong>g dependency between the head noun and the related gap. In the doubly<br />
embedded ORC the dependency is l<strong>in</strong>ear. In the head-<strong>in</strong>itial language English embedd<strong>in</strong>g<br />
results <strong>in</strong> the opposite complexity properties.<br />
Psychological Reality and Locality The semantic <strong>in</strong>terpretation <strong>of</strong> the Mandar<strong>in</strong><br />
RC structure is the same as <strong>in</strong> other languages. However, their dramatic syntactic<br />
difference from post-nom<strong>in</strong>al RCs raises the question whether head-f<strong>in</strong>al constructions<br />
22
2.2 Ch<strong>in</strong>ese Relative Clauses<br />
are syntactically comparable to head-<strong>in</strong>itial RCs. A cross-l<strong>in</strong>guistic process<strong>in</strong>g theory<br />
for RCs should capture all k<strong>in</strong>ds <strong>of</strong> RCs. This requires the captured structures to<br />
<strong>in</strong>duce similar pars<strong>in</strong>g processes. If the Ch<strong>in</strong>ese language did <strong>in</strong> fact not use comparable<br />
syntactic realizations, there would be no need for exist<strong>in</strong>g syntactic RC theories to fit the<br />
Ch<strong>in</strong>ese data. Indeed some researchers treat head-f<strong>in</strong>al RCs as adjective-like adjuncts not<br />
conta<strong>in</strong><strong>in</strong>g any gap (e.g. for Japanese: Matsumoto, 1997). This would make a filler-gap<br />
resolution process unnecessary <strong>in</strong> pars<strong>in</strong>g Ch<strong>in</strong>ese RCs. However, L<strong>in</strong> et al. (2005) (also<br />
reported <strong>in</strong> L<strong>in</strong>, 2007) provided empirical evidence that Mandar<strong>in</strong> RC constructions are<br />
<strong>in</strong>deed gap-conta<strong>in</strong><strong>in</strong>g structures that are processed differently from adjunctive phrases.<br />
Their experiment conta<strong>in</strong>ed Ch<strong>in</strong>ese possessor relative clauses (PRCs), which are similar<br />
to adjunctive phrases on surface structure. In contrast to RCs, PRCs do not conta<strong>in</strong> an<br />
overt gap. In a canonical PRC like <strong>in</strong> example (11) the region before the relativizer has<br />
the structure ‘N1 V _ N2’ with the covert possessor gap be<strong>in</strong>g between the verb and<br />
the possessee.<br />
(11) huairen bangjia _<br />
bad guys kidnap<br />
laopo<br />
wife<br />
de zongcai jued<strong>in</strong>g baoj<strong>in</strong>g<br />
DE chairman decide call police<br />
‘The chairman whose wife some bad guys kidnapped decided to call the police.’<br />
The construction can be slightly changed to alter the gap position. By us<strong>in</strong>g the<br />
marker ba, gap and object can appear pre-verbally. Insert<strong>in</strong>g the passive marker bei<br />
allows the possessor gap to be <strong>in</strong> the sentence <strong>in</strong>itial subject position. In a self-paced<br />
read<strong>in</strong>g experiment L<strong>in</strong> and colleagues controlled the material for three different positions<br />
<strong>of</strong> the potential gap and compared the read<strong>in</strong>g times to adjunctive clauses. The results<br />
show that the read<strong>in</strong>g speed on the head noun was dependent on the gap position only<br />
for the possessive RCs but not for the adjunctive clauses. The process<strong>in</strong>g differences are<br />
<strong>in</strong>terpreted <strong>in</strong> L<strong>in</strong> (2007) as evidence for filler-gap dependencies <strong>in</strong> Ch<strong>in</strong>ese pre-nom<strong>in</strong>al<br />
relative clauses. This makes them “psychologically real” (L<strong>in</strong>, 2007; p. 9) and, hence,<br />
comparable to post-nom<strong>in</strong>al RCs. Another crucial f<strong>in</strong>d<strong>in</strong>g was that read<strong>in</strong>g time on the<br />
head noun <strong>of</strong> the PRCs was fastest <strong>in</strong> the bei condition. Notably, this is the condition<br />
hav<strong>in</strong>g the possessor gap <strong>in</strong> subject position, mak<strong>in</strong>g the filler-gap distance longer than<br />
<strong>in</strong> the other two conditions. This clearly contradicts a locality account.<br />
Elided Subject or Gap Assumption For some <strong>of</strong> the exam<strong>in</strong>ed theories it is important<br />
to know whether the reader is aware <strong>of</strong> the <strong>in</strong>itial subject gap <strong>in</strong> the Ch<strong>in</strong>ese SRC<br />
with the form ‘gap V N1 de N2’. The knowledge <strong>of</strong> the gap can affect <strong>in</strong>tegration and<br />
memory processes as well as structural predictions and the gap-search<strong>in</strong>g mechanism.<br />
For example, if the SRC was the only construction <strong>in</strong> Mandar<strong>in</strong> that starts with ‘V N’,<br />
the reader would know immediately at the first word that he is read<strong>in</strong>g an SRC. Address<strong>in</strong>g<br />
that question, Kuo and Vasishth (2007) performed a corpus study on the S<strong>in</strong>ica<br />
Corpus 3.0 (5 million words), which is summarized <strong>in</strong> table 2.2. Kuo and Vasishth found<br />
639 SRC-like structures (V N1 de N2), <strong>of</strong> which only 19% (119) were <strong>in</strong> fact subject<br />
23
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
relatives. The majority <strong>of</strong> the structures were possessive modifiers with an <strong>in</strong>animate<br />
head noun (see example 12).<br />
(12) a. tisheng qiye de j<strong>in</strong>gzhengli<br />
<strong>in</strong>crease company gen competitiveness<br />
‘To <strong>in</strong>crease the companys competitiveness.’<br />
b. guyong<br />
hire<br />
yuangong<br />
employee<br />
de chengben. . .<br />
gen cost<br />
‘The cost <strong>of</strong> hir<strong>in</strong>g an employee. . . ’<br />
SRC-like ORC-like<br />
V N1 de N2 Predicate N1 V de N2 Predicate<br />
N2 animate N2 <strong>in</strong>animate N2 animate N2 <strong>in</strong>animate<br />
N1 animate 13 51 3 42<br />
N1 <strong>in</strong>animate 106 469 1 71<br />
Table 2.2: Table from Kuo and Vasishth (2007), summariz<strong>in</strong>g their corpus study on<br />
RC-like structures. Bona fide RCs are <strong>in</strong>dicated by numbers <strong>in</strong> bold.<br />
Consider<strong>in</strong>g these non-gapped structures and the existence <strong>of</strong> further non-gapped<br />
constructions <strong>in</strong> Mandar<strong>in</strong> that start with ‘V N’ the reader’s awareness <strong>of</strong> the gap is<br />
questionable. In Ch<strong>in</strong>ese <strong>in</strong> an appropriate context even mono-clausal structures with<br />
an elided subject (_ V N) are possible. However, s<strong>in</strong>ce it is not clear how frequent these<br />
structures are, and s<strong>in</strong>ce they mostly need special contexts, it is not clear how pro-drop<br />
mono-clauses affect the parser’s predictions. Because the question is not clearly solved<br />
Kuo and Vasishth consider both the Gap Assumption and the Elided Subject Assumption<br />
as concurrent theories. Under the Gap Assumption the reader knows immediately that<br />
he or she is read<strong>in</strong>g a subject relative. Under the Elided Subject Assumption, which<br />
receives stronger support from the corpus study, the predictions are rather unclear but<br />
should not <strong>in</strong>volve a gapped structure.<br />
Garden Path effects L<strong>in</strong> (2007) reports evidence for garden path effects due to<br />
temporal ambiguities <strong>in</strong> both head-f<strong>in</strong>al object and subject RCs. Several read<strong>in</strong>g time<br />
studies <strong>of</strong> Japanese and Ch<strong>in</strong>ese Mandar<strong>in</strong> show a facilitat<strong>in</strong>g effect on the relativizer<br />
and the head noun when the RC region is disambiguated earlier. The disambiguation<br />
was achieved by explicit mark<strong>in</strong>g, classifier mismatch (e.g. Hsu et al., 2006; Yoshida<br />
et al., 2004), RC-<strong>in</strong>duc<strong>in</strong>g contexts (e.g. Ishizuka et al., 2006), or explicit participant<br />
<strong>in</strong>formation (L<strong>in</strong> and Bever, 2007). The facilitat<strong>in</strong>g effect suggests that without disambiguation<br />
a reanalysis happens at the region <strong>of</strong> the relativizer and the head noun because<br />
the parser expects a ma<strong>in</strong> clause. See<strong>in</strong>g this effect not only <strong>in</strong> the ORC but also <strong>in</strong> the<br />
24
2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />
SRC provides further evidence for the elided subject assumption favored by Kuo and<br />
Vasishth (2007).<br />
How are the special characteristics <strong>of</strong> the Mandar<strong>in</strong> RC <strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g differences<br />
between subject and object extraction? Do they lead to a subject preference<br />
prediction or do they account for a deviance from a cross-l<strong>in</strong>guistically consistent theory?<br />
What further properties <strong>of</strong> Ch<strong>in</strong>ese Mandar<strong>in</strong> have to be taken <strong>in</strong>to account to ga<strong>in</strong><br />
useful predictions? I will now briefly resume the theories concern<strong>in</strong>g the subject/object<br />
difference that were presented <strong>in</strong> chapter 1 and make their predictions for Mandar<strong>in</strong> and<br />
English relative clauses explicit. Then, empirical studies concern<strong>in</strong>g the Mandar<strong>in</strong> RCs<br />
will be exam<strong>in</strong>ed and related to the theoretical predictions.<br />
2.3 Predict<strong>in</strong>g RC Extraction Preferences<br />
Cross-l<strong>in</strong>guistically<br />
2.3.1 Memory<br />
DLT<br />
In the SRC <strong>in</strong> examples (1a) and (8a) the distance between the filler (the head noun)<br />
and its gap is short. The ORC (examples 1b and 8b) on the other hand, conta<strong>in</strong>s a<br />
distant dependency. The cost metrics as illustrated for English <strong>in</strong> figure 2.2 show that<br />
the Dependency Locality Theory accounts for higher difficulties <strong>in</strong> the ORC.<br />
SRC The reporter whoi ei attacked the senator admitted the error<br />
C(I) 0 0 0 0+1 0 0+1 3 0 0+1<br />
C(S) 2 1 3 2 2 1 1 1 0<br />
Total 2 1 3 3 2 2 4 1 1<br />
ORC The reporter whoi the senator attacked ei admitted the error<br />
C(I) 0 0 0 0 0 1+2 3 0 0+1<br />
C(S) 2 1 3 4 3 1 1 1 0<br />
Total 2 1 3 4 3 3 4 1 1<br />
Figure 2.2: DLT cost metrics for English object and subject relative clauses<br />
The example shows that Integration Cost on the embedded verb attacked is higher<br />
for the ORC because two <strong>in</strong>tegrations take place at that position. First, attacked is<br />
<strong>in</strong>tegrated with its subject senator, consum<strong>in</strong>g 1 EU. Establish<strong>in</strong>g the relation between<br />
the relative pronoun who and the empty element consumes 2 EU because two discourse<br />
25
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
referents (senator and attacked) have been processed meanwhile. In the SRC, <strong>in</strong>tegrat<strong>in</strong>g<br />
the empty element is cost-free, while the subject attachment <strong>of</strong> attacked uses 1 EU<br />
because <strong>of</strong> attacked count<strong>in</strong>g as one discourse referent. This results <strong>in</strong> a 3:1 advantage<br />
for the ORC. This is partly due to the fact that <strong>in</strong> the ORC all arguments are <strong>in</strong>tegrated<br />
at once. In the SRC the object <strong>in</strong>tegration <strong>of</strong> the RC happens later at senator, consum<strong>in</strong>g<br />
1 EU. Still, the difference with respect to the whole RC region is 3:2 and <strong>in</strong>creases when<br />
Storage Cost is counted <strong>in</strong>. Storage Cost differs from position four on, where the word<br />
the <strong>in</strong> the ORC predicts four heads, while attacked <strong>in</strong> the SRC predicts only two. The<br />
reporter who the predicts the upcom<strong>in</strong>g <strong>of</strong> an embedded subject (s<strong>in</strong>ce reporter is the<br />
object), a transitive verb (s<strong>in</strong>ce the sentence has an object), an object gap, and the ma<strong>in</strong><br />
verb. The reporter who attacked only predicts the ma<strong>in</strong> verb and a direct object for the<br />
embedded clause. Summ<strong>in</strong>g the cost metrics together we ga<strong>in</strong> a total cost <strong>of</strong> 12 units for<br />
the ORC compared to 9 units for the SRC throughout the RC. Relat<strong>in</strong>g process<strong>in</strong>g cost<br />
to process<strong>in</strong>g time DLT clearly predicts a process<strong>in</strong>g advantage for SRCs on the whole<br />
embedded clause. Consistent with the K<strong>in</strong>g and Just (1991) study and the correction<br />
by Grodner and Gibson (2005) the DLT predicts no clause type difference on the ma<strong>in</strong><br />
verb. However, both studies show longest read<strong>in</strong>g times at the embedded verb <strong>in</strong> the<br />
ORC, whereas the DLT predicts a higher difficulty at the ma<strong>in</strong> verb.<br />
All <strong>in</strong> all there a subject preference is predicted for English. Now, what has the DLT<br />
to say about Ch<strong>in</strong>ese? Figure 2.3 shows the costs for Mandar<strong>in</strong> subject and object<br />
relative clauses as assumed by Hsiao and Gibson (2003).<br />
SRC ei yaoq<strong>in</strong>g fuhao dei guanyuan x<strong>in</strong>huaibugui.<br />
gap <strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
C(I) 0 1 1 3 1<br />
C(S) 3 2 2 1 0<br />
Total 3 3 3 4 1<br />
ORC fuhao yaoq<strong>in</strong>g ei dei guanyuan x<strong>in</strong>huaibugui.<br />
tycoon <strong>in</strong>vite gap gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
C(I) 0 1 0 1 1<br />
C(S) 1 1 2 1 0<br />
Total 1 2 2 2 1<br />
Figure 2.3: DLT Integration and Storage Cost for Mandar<strong>in</strong> relative clauses.<br />
Integration Cost ma<strong>in</strong>ly predicts a difference on the head noun guanyuan “<strong>of</strong>ficial”.<br />
The reason is the greater distance between the head-noun and the embedded gap position<br />
<strong>in</strong> the SRC. Also on the relativizer there is an ORC advantage because at this po<strong>in</strong>t<br />
<strong>in</strong> the SRC the referred subject is <strong>in</strong>tegrated with the embedded verb. Storage Cost<br />
predicts higher difficulty for the SRC from the first word on. Hsiao and Gibson assume<br />
26
2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />
that at that position the reader already knows that an RC is follow<strong>in</strong>g because the<br />
sentential subject is miss<strong>in</strong>g. In other words, the gap is overtly recognized and affects<br />
the prediction. For the same reason the second word predicts more heads <strong>in</strong> the SRC<br />
than <strong>in</strong> the ORC, namely the miss<strong>in</strong>g subject and a ma<strong>in</strong> verb. In the ORC, on the<br />
other hand, only a direct object is predicted due to the temporal ambiguity <strong>of</strong> the <strong>in</strong>itial<br />
RC resembl<strong>in</strong>g the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a ma<strong>in</strong> clause. The result<strong>in</strong>g prediction <strong>of</strong> the DLT is<br />
a higher process<strong>in</strong>g cost on the first two words <strong>of</strong> the SRC compared to the ORC. This<br />
means an object preference for Mandar<strong>in</strong> head-f<strong>in</strong>al RCs. However, the SRC Storage<br />
Cost predictions on the first two words are questionable. They rest on the assumption<br />
that the RC structure is the most probable one, given a miss<strong>in</strong>g subject. This would be<br />
the case under the Gap Assumption. However, assum<strong>in</strong>g that DLT’s prediction choices<br />
correlate with corpus frequencies it is very unlikely that the parser should predict a<br />
relative clause before see<strong>in</strong>g the relativizer. Consequently, under the Elided Subject<br />
assumption the DLT would predict less heads on the first two words <strong>of</strong> the RC. Under<br />
this assumption the object preference should disappear <strong>in</strong> that region.<br />
Computational Models<br />
The CC-READER rests on similar assumptions as DLT, namely capacity limitation and<br />
<strong>in</strong>tegration-<strong>based</strong> decay. A simulation <strong>of</strong> K<strong>in</strong>g and Just’s study produced comparable<br />
results (see figure 2.4 from Just and Carpenter, 1992). However, the span × RC type<br />
<strong>in</strong>teraction on the ma<strong>in</strong> verb and the last word <strong>in</strong> the RC region was not predicted<br />
(MacDonald and Christiansen, 2002). Specifically, there is a greater read<strong>in</strong>g span effect<br />
<strong>in</strong> the ORC than <strong>in</strong> the SRC <strong>in</strong> the data, whereas the simulation results show no such<br />
difference. Additionally, as Lewis and Vasishth (2005) po<strong>in</strong>t out, CC-READER underestimates<br />
the difficulty on the ORC embedded verb compared to the ma<strong>in</strong> verb. All <strong>in</strong> all<br />
a subject preference is predicted for English. For Mandar<strong>in</strong> <strong>in</strong> accordance with the DLT<br />
an object preference can be expected. The same would apply for the ACT-R sentence<br />
process<strong>in</strong>g model. This one makes similar predictions as CC-READER with respect to<br />
English RCs, albeit fitt<strong>in</strong>g the results slightly better.<br />
2.3.2 Expectation<br />
In expectation-<strong>based</strong> theories like Levy (2008) the subject/object difference is accounted<br />
for similarly to experience-<strong>based</strong> theories. Superficially, the highly frequent SRC structure<br />
receives a higher rank<strong>in</strong>g <strong>in</strong> the probability distribution <strong>of</strong> cont<strong>in</strong>uations than the<br />
ORC. Consequently an expectation-<strong>based</strong> approach would predict a subject preference<br />
<strong>in</strong> all languages where the SRC corpus frequency exceeds the ORC frequency, as is the<br />
case <strong>in</strong> English, German, and Mandar<strong>in</strong>. However, similarly to experience theories, expectation<br />
is a framework without def<strong>in</strong>ite commitments. For ga<strong>in</strong><strong>in</strong>g detailed predictions<br />
it is necessary to know the exact word-by-word likelihoods with respect to the grammar.<br />
Here aga<strong>in</strong> the Gap versus Elided subject assumptions play theoretically a role. How-<br />
27
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
140 MARCEL ADAM JUST AND PATRICIA A. CARPENTER<br />
fc<br />
o<br />
H<br />
03<br />
Subject Relative<br />
I I I I<br />
[The] reporter senator admitted the<br />
that error,<br />
attacked<br />
the<br />
SIMULATION<br />
Object Relative<br />
a- B<br />
I I I<br />
[The] reporter attacked admitted the<br />
that error,<br />
the<br />
senator<br />
Figure 2.4: CC-READER simulation HUMAN results DATA on English subject and object extracted<br />
relative clauses (figure from Just and Carpenter, 1992; p. 140).<br />
Subject Relative<br />
Object Relative<br />
900<br />
Lowa<br />
ever, <strong>in</strong> Mandar<strong>in</strong> both assumptions would most likely result <strong>in</strong> the same predictions.<br />
Assum<strong>in</strong>g800 possible elided subjects <strong>in</strong> a ma<strong>in</strong> clause, both a ma<strong>in</strong> P'' clause and an embedded<br />
Low<br />
RC could be expected <strong>in</strong> either construction. Thus the mere frequency High <strong>of</strong> subject versus<br />
OH 700<br />
object Ed extractions could <strong>in</strong>deed be decisive <strong>in</strong> this case, predict<strong>in</strong>g a subject preference.<br />
On the other 600 hand, an overt subject gap <strong>in</strong> the SRC as assumed <strong>in</strong> the gap assumption<br />
would olower<br />
the cost at the relativizer for the SRC. Due to the ma<strong>in</strong> clause ambiguity<br />
2<br />
<strong>in</strong> the R ORC 500 a costly Highupdate<br />
<strong>of</strong> the plausibility rank<strong>in</strong>g would happen at the relativizer<br />
where, ORC becomes more 1 j_ likely i than I a ma<strong>in</strong> clause. This cost is lower I I I <strong>in</strong> the subject<br />
extraction[The] duereporter to thesenator higheradmitted rank<strong>in</strong>gthe <strong>of</strong> an embedded [The] reporter SRC. attacked Consequently, admitted theunder<br />
the Gap<br />
Assumption W athat syntactic expectationerror, theory wouldthat also predict a subject error. preference.<br />
S<br />
attacked<br />
the<br />
2.3.3 Canonicity<br />
the<br />
senator<br />
Figure 9. The number <strong>of</strong> cycles expended on various parts <strong>of</strong> the subject-relative sentences (on the left)<br />
and object-relative sentences (on the right) when the simulation, CC READER, is operat<strong>in</strong>g with more or<br />
less work<strong>in</strong>g memory capacity. (The bottom graph presents the human data for comparison with the<br />
simulation.)<br />
Consider<strong>in</strong>g Greenberg’s classification as a basis, what would a canonical word order account<br />
predict for English and German relative clauses? English subject relative clauses<br />
exhibit the canonical SVO structure whereas object relatives use an OSV order<strong>in</strong>g.<br />
Therefore, a heuristic or base-generative canonicity theory would assign a higher pro-<br />
Pragmatic Influence on Syntactic Process<strong>in</strong>g<br />
cess<strong>in</strong>g cost to ORCs <strong>in</strong> English, which is consistent with empirical evidence. The same<br />
applies for German, agree<strong>in</strong>g on the widely accepted SOV classification. As illustrated <strong>in</strong><br />
example (13) German SRCs have SOV order<strong>in</strong>g and would be preferred, whereas ORCs<br />
have an OSV order<strong>in</strong>g. If we considered an SVO basis, no clear predictions would be<br />
low span simulation, which has a smaller activation maximum.<br />
The words that follow the verbs evoke fewer productions, so<br />
even though the activation maximum applies dur<strong>in</strong>g their fir<strong>in</strong>g,<br />
they complete their execution <strong>in</strong> a smaller number <strong>of</strong> cycles<br />
(compared with the verb process<strong>in</strong>g), and the high-low difference<br />
becomes smaller.<br />
In summary possible. a simulation that varies the amount <strong>of</strong> activation<br />
available for simultaneously comput<strong>in</strong>g and mn<strong>in</strong>tatMng<br />
<strong>in</strong>formation accounts for the read<strong>in</strong>g time differences between<br />
high and low span subjects deal<strong>in</strong>g with syntactic complexity<br />
provided by center-embedded clauses.<br />
The simulation demonstrates how the contribution <strong>of</strong> a pragmatic<br />
cue to syntactic aaatyas depends on an adequate supply<br />
<strong>of</strong> activation. Fust consider the simaJation <strong>of</strong> the high span<br />
subjects (<strong>in</strong> which the activation maximum is relatively high) <strong>in</strong><br />
process<strong>in</strong>g the sentences conta<strong>in</strong><strong>in</strong>g reduced relative clauses.<br />
The <strong>in</strong>animacy <strong>in</strong>formation encoded with the subject noun<br />
"evidence") is still <strong>in</strong> an activated state whea the verb is<br />
be<strong>in</strong>g processed, and this <strong>in</strong>formation is used to select between<br />
the two <strong>in</strong>terpretations <strong>of</strong> the verb (past tense vs. past partici-<br />
28
2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />
(13) German SRC and ORC (Konieczny and Ruh, 2003):<br />
a. Der Wärter, deri ei den Häftl<strong>in</strong>g beleidigte, entdeckte den Tunnel. (SRC)<br />
S<br />
O V<br />
b. Der Wärter,<br />
O<br />
deni der Häftl<strong>in</strong>g ei<br />
S<br />
beleidigte,<br />
V<br />
entdeckte den Tunnel. (ORC)<br />
Where would the difficulties show up <strong>in</strong> read<strong>in</strong>g studies? This is not easy to answer<br />
for such a general theory. The location <strong>of</strong> the effects depends ma<strong>in</strong>ly on the underly<strong>in</strong>g<br />
pars<strong>in</strong>g theory and how it deals with unexpected structures. Generally, one can<br />
assume that slowdowns appear as soon as the reader realizes that he or she is read<strong>in</strong>g<br />
a non-canonical structure. In Mandar<strong>in</strong> this would be the <strong>in</strong>itial verb. Consider<strong>in</strong>g a<br />
transformational account with canonical base-generation on deep structure, some <strong>of</strong> the<br />
difficulties would most likely appear after hav<strong>in</strong>g read the whole embedded structure,<br />
because then it has to be reordered and <strong>in</strong>tegrated. Like English, Ch<strong>in</strong>ese Mandar<strong>in</strong><br />
is also claimed to be SVO (Hsiao and Gibson, 2003; Kuo and Vasishth, 2007). S<strong>in</strong>ce<br />
the ORCs resemble the canonical SVO order but SRCs have VOS, an object preference<br />
would be predicted for Mandar<strong>in</strong>. That means that a canonicity account <strong>based</strong> on<br />
the conventional classifications <strong>of</strong> SVO order<strong>in</strong>g would not speak for a cross-l<strong>in</strong>guistic<br />
subject preference but rather confirm Hsiao and Gibson’s claim <strong>of</strong> Mandar<strong>in</strong> be<strong>in</strong>g an<br />
exception. As stated earlier, the location <strong>of</strong> effects is not clearly to determ<strong>in</strong>e.<br />
2.3.4 <strong>Experience</strong><br />
Consider<strong>in</strong>g only RC type frequencies, there is a clear subject bias <strong>in</strong> most languages.<br />
But a mere comparison <strong>of</strong> SRC and ORC corpus frequencies is a rather abstract method<br />
and not psychologically motivated. A more comprehensive theory <strong>of</strong> experience would<br />
be driven by complex factors. Without a granularity commitment clear predictions are<br />
impossible. As discussed <strong>in</strong> 1.3, the implementation <strong>in</strong> a connectionist network could<br />
shed light on the complex structural relations. MacDonald and Christiansen (2002)<br />
used a simple recurrent prediction network to predict <strong>in</strong>dividual and global differences<br />
<strong>in</strong> English subject and object relative clauses. The study will be discussed <strong>in</strong> detail<br />
<strong>in</strong> chaper 3 and shall only be briefly mentioned here. The network was tra<strong>in</strong>ed on<br />
a simplified grammar <strong>of</strong> English to make word-by-word cont<strong>in</strong>uation predictions. It<br />
performed better on SRCs. Furthermore the <strong>in</strong>teractions found <strong>in</strong> the results are comparable<br />
to the read<strong>in</strong>g span × RC type × region <strong>in</strong>teractions <strong>in</strong> K<strong>in</strong>g and Just<br />
(1991) (see figure 3.3 <strong>in</strong> chapter 3 for details). MacDonald and Christiansen call this a<br />
frequency × regularity <strong>in</strong>teraction, because the word-by-word predictions for the SRC<br />
benefited from its regularity, specifically its similarity <strong>in</strong> word order with ma<strong>in</strong> clauses.<br />
Thus, <strong>in</strong> this case the experience account can be seen as equivalent to the canonicity<br />
account, which makes the same predictions for the same reason. Here the connectionist<br />
implementation justifies the canonicity approach on the basis <strong>of</strong> experience. But this<br />
is no <strong>in</strong>herent connection. The experience account, specifically when implemented as a<br />
29
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
connectionist network, may well make divergent predictions with respect to canonicity<br />
assumptions. Regard<strong>in</strong>g Mandar<strong>in</strong> RCs, Hsiao and Gibson (2003) are confident that<br />
regularity predicts an object preference. They also say that “[...] it rema<strong>in</strong>s an open<br />
question how to formalize this theory so that it makes more detailed predictions.” (p.<br />
14). In the follow<strong>in</strong>g Hsiao and Gibson suggest to implement a theory <strong>of</strong> that k<strong>in</strong>d <strong>in</strong> a<br />
connectionist system like MacDonald and Christiansen (2002) to decide that question.<br />
The discussion about SRC-like structures and elided subjects will be relevant for such<br />
a model. The model<strong>in</strong>g <strong>of</strong> Mandar<strong>in</strong> RC process<strong>in</strong>g is focus <strong>of</strong> this thesis and will be<br />
addressed <strong>in</strong> chapter 4.<br />
2.3.5 Other Explanations<br />
Active Filler Strategy<br />
The Active Filler Strategy (Frazier and Flores d’Arcais, 1989; Frazier and Clifton,<br />
1989) accounts for difficulties and garden-path<strong>in</strong>g effects <strong>in</strong> the region between a filler<br />
and its gap. As soon as a filler is identified the parser starts an active search for the<br />
appropriate gap. Interven<strong>in</strong>g potential gap positions produce resource-consum<strong>in</strong>g ambiguities.<br />
In a relative clause, follow<strong>in</strong>g the strategy, the parser would try to <strong>in</strong>sert the<br />
head-noun as filler immediately after the relativizer that because this is the first possible<br />
gap position. That is a successful strategy <strong>in</strong> the SRC but results <strong>in</strong> the need for<br />
reanalysis <strong>in</strong> ORCs. Therefore, the Active Filler Strategy predicts a higher process<strong>in</strong>g<br />
cost on noun and verb <strong>in</strong> ORCs due to reanalysis.<br />
It is not clear what the Active Filler Strategy would predict for head-f<strong>in</strong>al relative<br />
clauses. S<strong>in</strong>ce the filler succeeds the potential gap positions, the gap-search<strong>in</strong>g has to<br />
happen <strong>in</strong> a post-process<strong>in</strong>g stage or maybe by re-read<strong>in</strong>g the embedded clause. Another<br />
question is then whether the search happens from the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the RC or backwards,<br />
start<strong>in</strong>g from the head noun. A reasonable assumption could be that the Active Filler<br />
Strategy would not apply at all when the whole phrase is already seen. Other structural<br />
strategies like the one proposed by L<strong>in</strong> and Bever (2006b) probably fit better <strong>in</strong>to a<br />
head-f<strong>in</strong>al situation. Then the question <strong>of</strong> subject/object preference is handed over to<br />
accessibility concerns.<br />
Accessibility<br />
A theory <strong>of</strong> accessibility comes <strong>in</strong> several versions. What is shared between them and<br />
important for predictions concern<strong>in</strong>g the subject/object difference is that subjects are<br />
easier accessible than objects. For example, Keenan and Comrie (1977) <strong>in</strong>troduce an<br />
accessibility-order<strong>in</strong>g for grammatical functions like the follow<strong>in</strong>g:<br />
subject > direct object > <strong>in</strong>direct object > . . .<br />
30
2.3 Predict<strong>in</strong>g RC Extraction Preferences Cross-l<strong>in</strong>guistically<br />
This hierarchy is <strong>based</strong> on observed preferences for relativized NPs <strong>in</strong> a number <strong>of</strong><br />
languages. The explanation is that subjects are more obligatory for predicates than<br />
objects are, and therefore, are more predictable.<br />
L<strong>in</strong>, Fong, and Bever (2005) and L<strong>in</strong> and Bever (2006b) suggest that the subject<br />
position is higher up <strong>in</strong> the syntactic tree structure than the object position. They propose<br />
an <strong>in</strong>cremental m<strong>in</strong>imalist parser (IMP), that performs a top down search through<br />
the tree from the filler to the gap. The search starts at the head noun and proceeds<br />
downward, look<strong>in</strong>g for a c-commanded trace. S<strong>in</strong>ce subjects are higher <strong>in</strong> the tree, this<br />
mechanism makes sure that subject traces are always accessed first, irrespective <strong>of</strong> fillergap<br />
distances or specific word orders: “This top-down search<strong>in</strong>g mechanism overrides<br />
the effect <strong>of</strong> NP recency (i.e. l<strong>in</strong>ear locality), and passive complexity (i.e. canonicity).”<br />
(L<strong>in</strong> et al., 2005; p. 11). S<strong>in</strong>ce an Accessibility Theory is <strong>in</strong>dependent <strong>of</strong> word order and<br />
locality, it would predict cross-l<strong>in</strong>guistic preference for subject extractions.<br />
The top-down gap-search<strong>in</strong>g mechanism makes the same predictions for pre- and postnom<strong>in</strong>al<br />
RCs. Consequently a subject preference is also predicted for Mandar<strong>in</strong> RCs.<br />
The facilitation effect <strong>in</strong> subject extraction would occur on the head noun where the<br />
gap-search<strong>in</strong>g mechanism is <strong>in</strong>itiated. Support<strong>in</strong>g evidence for this structural account<br />
provides the PRC experiment by L<strong>in</strong> et al. (2005) reported above: the easiest condition<br />
was the one where the gap is <strong>in</strong> subject position. There was no significant difference on<br />
the head noun between the other two conditions.<br />
Perspective Shift<br />
A more pragmatic explanation provides the theory <strong>of</strong> perspective shift (MacWh<strong>in</strong>ney and<br />
Pleh, 1988; MacWh<strong>in</strong>ney, 1982; 1977). Somehow the comprehender preferably adopts<br />
the perspective <strong>of</strong> a sentential subject. Consequently, when the subject changes the<br />
comprehender has to shift his or her perspective. An object relative clause demands a<br />
perspective shift from the ma<strong>in</strong> clause subject to the RC’s subject and then back to the<br />
ma<strong>in</strong> clause subject after completion <strong>of</strong> the embedd<strong>in</strong>g-structure. In subject relatives<br />
the subject is the same for both clauses and hence no shift<strong>in</strong>g is required. Perspective<br />
shift<strong>in</strong>g demands process<strong>in</strong>g resources. That makes ORCs costlier to process because<br />
here two shifts are necessary and none <strong>in</strong> SRCs. A slowdown <strong>in</strong> ORC read<strong>in</strong>g is predicted<br />
on the embedded NP (first shift) and the ma<strong>in</strong> verb (second shift).<br />
The perspective shift<strong>in</strong>g account predicts a subject-preference <strong>in</strong> most languages with<br />
post-nom<strong>in</strong>al RCs. For pre-nom<strong>in</strong>al RCs <strong>in</strong> a language like Mandar<strong>in</strong> the pattern would<br />
change. In Mandar<strong>in</strong> ORCs the subject <strong>of</strong> the RC is <strong>in</strong> <strong>in</strong>itial position followed by a<br />
verb. See<strong>in</strong>g the head noun, a ma<strong>in</strong> clause is predicted. Thus one shift is necessary to<br />
change perspective from the embedded to the head noun, be<strong>in</strong>g the subject now. As for<br />
Mandar<strong>in</strong> SRCs, the locus <strong>of</strong> perspective is dependent on the predicted structure when<br />
read<strong>in</strong>g the ‘V N’ sequence. Predict<strong>in</strong>g a gapped structure, i.e., be<strong>in</strong>g aware <strong>of</strong> the SRC,<br />
would cause an expectation <strong>of</strong> a subject head noun. This would not require any shift<strong>in</strong>g.<br />
However, what happens when the gap is not recognized? Could perspective possibly<br />
31
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
center on the RC object which is the only NP available, wrongly <strong>in</strong>terpret<strong>in</strong>g it as an<br />
object? If so, read<strong>in</strong>g an SRC would also require one shift, namely from the embedded<br />
NP to the sentential subject. This makes SRCs numerically as hard as ORCs. In addition<br />
to the perspective shift <strong>in</strong> the SRC a reanalysis would be expected. The answers to the<br />
questions concern<strong>in</strong>g perspective shift <strong>in</strong> Ch<strong>in</strong>ese depends on the mechanism guid<strong>in</strong>g the<br />
reader’s perspective when subject is ambiguous or absent. To sum up, the perspective<br />
shift account could probably account for a subject advantage <strong>in</strong> Mandar<strong>in</strong>, but this is<br />
not clear. If so, an effect is expected on the head noun.<br />
2.3.6 Summary<br />
Table 2.3 shows an overview <strong>of</strong> the theories addressed here and their predictions regard<strong>in</strong>g<br />
English and Mandar<strong>in</strong> RCs. All mentioned theories agree on a subject preference<br />
for English. However, a heterogenous picture appears on the Mandar<strong>in</strong> side. There<br />
is a slight bias <strong>in</strong> the prediction pattern <strong>in</strong> favor <strong>of</strong> a subject preference. This would<br />
<strong>in</strong>tegrate nicely <strong>in</strong>to an otherwise universal consistency. Accessibility, Expectation, perspective<br />
shift, and pure RC type frequency predict a clear subject preference, whereas<br />
canonicity, Integration Cost, and Storage Cost under the Gap Assumption predict an<br />
ORC advantage. Storage Cost under the Elided Subject Assumption would predict a<br />
subject advantage on the RC region and an object advantage on the head noun. The<br />
predictions <strong>of</strong> the Active Filler Strategy are unclear. As for experience, due to the granularity<br />
problem the predictions are not clear. A connectionist implementation, as follows<br />
<strong>in</strong> chapter 4, is believed to make more specific predictions. Anticipat<strong>in</strong>g the results, the<br />
simulations predicted a weak ORC preference, which appeared, however, only at the<br />
relativizer. Account<strong>in</strong>g for the corpus data by Kuo and Vasishth (2007) even caused<br />
a subject preference <strong>in</strong> the RC region. To f<strong>in</strong>d out about the just discussed theories’<br />
compatibility with empirical data, the next section will report important studies on the<br />
subject/object difference <strong>in</strong> Ch<strong>in</strong>ese.<br />
2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />
Hsiao and Gibson, 2003<br />
The self-paced read<strong>in</strong>g study by Hsiao and Gibson (2003) was the first to report results<br />
address<strong>in</strong>g the subject/object difference <strong>in</strong> Ch<strong>in</strong>ese. It had great impact on the discussion<br />
about the universality <strong>of</strong> the subject preference across-languages because Ch<strong>in</strong>ese<br />
was the first exception discovered. Hsiao and Gibson studied s<strong>in</strong>gly and doubly embedded<br />
Mandar<strong>in</strong> relative clauses like the ones <strong>in</strong> examples (2.2) and (10) <strong>in</strong> a self-paced<br />
read<strong>in</strong>g task. For s<strong>in</strong>gle-embedd<strong>in</strong>g they found an advantage for ORCs on the region<br />
before the relativizer (N1 V1 / V1 N1). For the double-embedded RCs the relevant<br />
regions were the 3rd and 4th word (de1 N2 / N1 de1 ) and the 5th and 6th word (V2<br />
de2 / N2 de2 ). On both regions an object advantage was measured. Both s<strong>in</strong>gly and<br />
32
2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />
Theory E-S C-S C-O<br />
√ √<br />
Canonicity<br />
√ √<br />
IC<br />
√ √<br />
SC + Gap<br />
√ √<br />
Accessibility<br />
√ √<br />
Expectation<br />
√ √<br />
Perspective<br />
√ √<br />
RC Frequency<br />
√<br />
Active Filler<br />
? ?<br />
SC + Elided Subj. √ ( √ )<br />
√ √ √<br />
<strong>Experience</strong><br />
( ) ( )<br />
Table 2.3: Extraction preference predictions for English and Mandar<strong>in</strong> RCs. E-S =<br />
English subject preference. C-S and C-O = Ch<strong>in</strong>ese subject or object preference,<br />
respectively. IC = Integration Cost, SC = Storage Cost.<br />
doubly embedded RCs show an object preference. Accord<strong>in</strong>g to L<strong>in</strong> (2007) a garden<br />
path effect is expected <strong>in</strong> both RC types due to an <strong>in</strong>itial mis<strong>in</strong>terpretation as a ma<strong>in</strong><br />
clause. A reanalysis should take place at the relativizer/head noun region, especially <strong>in</strong><br />
the ORC, lead<strong>in</strong>g to higher read<strong>in</strong>g times. Interest<strong>in</strong>gly, Hsiao and Gibson’s data do not<br />
show such an effect. Nevertheless, Hsiao and Gibson argue for an <strong>in</strong>itial misanalysis <strong>in</strong><br />
the ORC but not <strong>in</strong> the SRC. Under these premises their result is consistent with the<br />
Storage Cost account when <strong>in</strong>terpreted <strong>in</strong> terms <strong>of</strong> the Gap Assumption: In the SRC<br />
more heads are predicted, s<strong>in</strong>ce an RC is expected, while <strong>in</strong> the ORC accord<strong>in</strong>g to a<br />
ma<strong>in</strong> clause <strong>in</strong>terpretation less heads are predicted. Hsiao and Gibson assume that no<br />
reanalysis is necessary because the already constructed ma<strong>in</strong> clause structure <strong>in</strong> the ORC<br />
does not have to be modified for attach<strong>in</strong>g it as a relative clause. Concern<strong>in</strong>g Integration<br />
Cost, the results do not support the theory. As there is no significant effect on the head<br />
noun, the results do not show the predicted Integration Cost due to a longer filler-gap<br />
distance <strong>in</strong> the SRC. A naive application <strong>of</strong> the canonicity account would def<strong>in</strong>itely fit<br />
the data. The more canonical ORC structure, which resembles the ma<strong>in</strong> clause word<br />
order, seems <strong>in</strong>deed easier to process. However, as Hsiao and Gibson state themselves,<br />
it rema<strong>in</strong>s unclear what canonicity and structural frequency accounts predict without<br />
hav<strong>in</strong>g reasonable evidence for what counts as canonical or what structural level feeds<br />
<strong>in</strong>to frequency calculations.<br />
There are five major issues <strong>in</strong> Hsiao and Gibson’s study that can be criticized.<br />
a) Animacy Hsiao and Gibson only used animate NPs <strong>in</strong> their stimuli. Kuo and<br />
Vasishth’s corpus study (reported <strong>in</strong> 2.2) revealed that hardly any RC-like structures<br />
actually conta<strong>in</strong> two animate NPs. In cases with an animate embedded noun the most<br />
33
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
frequent cont<strong>in</strong>uation conta<strong>in</strong>s an <strong>in</strong>animate head noun. Only 2% <strong>of</strong> the SRC-like structures<br />
(V N1 de N2) were SRCs with two animate nouns. Of the found ORC-like structures<br />
(N1 V1 de N2) about 39% were ORCs. About 93% <strong>of</strong> these <strong>in</strong>volved an <strong>in</strong>animate<br />
head noun, leav<strong>in</strong>g only 7% to the structures Hsiao and Gibson studied. This has two<br />
consequences. First, the stimuli used by Hsiao and Gibson (2003) are not natural relative<br />
clauses but rarely occurr<strong>in</strong>g constructions most readers never come across. This<br />
may cause a confound <strong>in</strong> the results. The second po<strong>in</strong>t is that the overall unexpected<br />
animacy <strong>of</strong> the head noun would <strong>in</strong>duce a surprisal effect at this position which has not<br />
been found by Hsiao and Gibson.<br />
b) Clause Type vs. Embedd<strong>in</strong>g Confound <strong>in</strong> Double-embedd<strong>in</strong>g As mentioned<br />
<strong>in</strong> 1.2 and 2.2, double-embedd<strong>in</strong>g <strong>in</strong>troduces special complexities which go beyond<br />
word order differences <strong>of</strong> s<strong>in</strong>gle embedd<strong>in</strong>g. In particular Mandar<strong>in</strong> SRC recursion produces<br />
a center-embedd<strong>in</strong>g structure while object-extracted embedd<strong>in</strong>g results <strong>in</strong> serial<br />
dependencies. This makes multiply embedded ORCs easier than SRCs. Hence discovered<br />
effects <strong>in</strong> multiple embedd<strong>in</strong>g might be due to the difference <strong>in</strong> complexity which<br />
is not existent <strong>in</strong> s<strong>in</strong>gle embedd<strong>in</strong>g.L<strong>in</strong> and Bever (2006b) <strong>in</strong>terpret this as a confound,<br />
question<strong>in</strong>g the contribution <strong>of</strong> double-embedd<strong>in</strong>g studies, whereas Kuo and Vasishth<br />
(2007) do not agree with that view.<br />
c) Gap assumption Hsiao and Gibson’s explanations for their results rest on the Gap<br />
Assumption. However, the evidence provided by the corpus study <strong>of</strong> Kuo and Vasishth<br />
makes the truth <strong>of</strong> that assumption appear unlikely. Relative clauses make only about<br />
20% <strong>of</strong> structural similar occurrences. As for corpus frequencies it seems that a gapped<br />
RC structure is the least probable to expect when read<strong>in</strong>g either <strong>of</strong> the two RC types.<br />
d) Syntactically Ambiguous Verbs L<strong>in</strong> and Bever (2006b) criticize Hsiao and Gibson’s<br />
study for us<strong>in</strong>g verbs that were unbalanced regard<strong>in</strong>g their syntactic arguments.<br />
In addition to direct objects 7 <strong>of</strong> the verbs took sentential complements and 13 took<br />
verbal complements.<br />
e) Inconclusive RC region L<strong>in</strong> and Bever (2006b) claim that the pre-relativizer<br />
region is <strong>in</strong>conclusive with respect to differences concern<strong>in</strong>g relative clauses because the<br />
reader is not aware <strong>of</strong> the RC at that po<strong>in</strong>t, yet. L<strong>in</strong> and Bever attribute the difficulty <strong>in</strong><br />
SRCs to the miss<strong>in</strong>g subject <strong>in</strong> an allegedly regular sentence. However, it is not so clear<br />
whether it is right to call this a confound or rather a possible explanation for process<strong>in</strong>g<br />
difficulties <strong>in</strong> the RC. Attribut<strong>in</strong>g specific effects to relative clause process<strong>in</strong>g does not<br />
require the reader to know what structure he or she is actually read<strong>in</strong>g.<br />
34
2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />
L<strong>in</strong> and Bever (2006a)<br />
Address<strong>in</strong>g the ambiguous verbs confound <strong>in</strong> Hsiao and Gibson (2003), L<strong>in</strong> and Bever<br />
(2006a) conducted a self-paced read<strong>in</strong>g experiment with verbs that only took nom<strong>in</strong>al<br />
objects. They controlled for RC type and also if the RC was modify<strong>in</strong>g the subject<br />
or object <strong>of</strong> the matrix clause. In all conditions there was a subject preference on the<br />
relativizer and the head noun. No effect was observed on the RC region. The result<br />
contradicts the study <strong>of</strong> Hsiao and Gibson, as the overall preference is the opposite.<br />
However, to be exact, it is not necessarily contradiction, s<strong>in</strong>ce the locations <strong>of</strong> the effects<br />
do not overlap <strong>in</strong> the two experiments. So, it may theoretically be possible that both<br />
results are consistent.<br />
L<strong>in</strong> and Garnsey (2007)<br />
The read<strong>in</strong>g time study by L<strong>in</strong> and Garnsey (2007) provided evidence that ORCs are<br />
easier to comprehend. In addition they showed that animacy <strong>in</strong>formation is an important<br />
factor <strong>in</strong> the comprehension process and is used very early by the reader. Their<br />
stimuli were Mandar<strong>in</strong> RCs with another noun follow<strong>in</strong>g the head noun. The head noun<br />
could optionally be omitted. When dropp<strong>in</strong>g the head noun, the second noun could<br />
ambiguously <strong>in</strong>terpreted as the head noun. The confusability <strong>of</strong> the two nouns was<br />
controlled for by animacy. The plausibility <strong>of</strong> an animate RC head noun compared to<br />
an <strong>in</strong>amimate one was <strong>in</strong>creased by semantic implications <strong>of</strong> the embedded verb. The<br />
results show that animacy <strong>in</strong>formation was used immediately to solve the ambiguity <strong>in</strong><br />
both conditions (with and without head noun). Subject extraction was more difficult<br />
than object extraction <strong>in</strong> all conditions. The conditions with miss<strong>in</strong>g head nouns were<br />
most difficult. When the nouns were confusable, the differences between subject and object<br />
extraction were also found <strong>in</strong> regions after the head noun po<strong>in</strong>t<strong>in</strong>g to an <strong>in</strong>teraction<br />
<strong>of</strong> RC type with similarity-baed <strong>in</strong>terference (Gordon et al., 2006).<br />
Kuo and Vasishth (2007)<br />
Kuo and Vasishth (2007) conducted a self-paced read<strong>in</strong>g experiment that used the s<strong>in</strong>gly<br />
embedded stimuli <strong>of</strong> Hsiao and Gibson (2003). Additionally, Kuo and Vasishth added<br />
two more conditions to further assess the validity <strong>of</strong> the Gap Assumption as assumed by<br />
Hsiao and Gibson (2003) and <strong>of</strong> the Storage Cost predictions. The two extra conditions<br />
are shown <strong>in</strong> example (2.4). In (14b) The ORC is fronted by the passivization marker<br />
bei, which is ungrammatical <strong>in</strong> front <strong>of</strong> a ma<strong>in</strong> clause out <strong>of</strong> context. Thus, <strong>in</strong>sert<strong>in</strong>g<br />
bei removed the ma<strong>in</strong> clause ambiguity <strong>in</strong> the ORC. On the other hand, <strong>in</strong>sert<strong>in</strong>g the<br />
demonstrative zheige pre-verbally <strong>in</strong> the SRC (example 14a) makes the subject gap<br />
obvious. The demonstrative <strong>in</strong> comb<strong>in</strong>ation with the verb raises the expectation for a<br />
relative clause because a noun is needed to fill the gap. Hence the possibility <strong>of</strong> the<br />
structure cont<strong>in</strong>u<strong>in</strong>g as a ma<strong>in</strong> clause with an elided subject is excluded.<br />
35
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
(14) a. Mandar<strong>in</strong> SRC with demonstrative:<br />
[Zheige [ei yaoq<strong>in</strong>g fuhao dei] guanyuani] x<strong>in</strong>huaibugui.<br />
This-CL <strong>in</strong>vite tycoon gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
’The <strong>of</strong>ficial who <strong>in</strong>vited the tycoon has bad <strong>in</strong>tentions.’<br />
b. Passivized Mandar<strong>in</strong> ORC:<br />
[Bei fuhao yaoq<strong>in</strong>g ei dei] guanyuani x<strong>in</strong>huaibugui.<br />
bei tycoon <strong>in</strong>vite gen <strong>of</strong>ficial have bad <strong>in</strong>tentions<br />
’The <strong>of</strong>ficial who the tycoon <strong>in</strong>vited has bad <strong>in</strong>tentions.’<br />
If the Gap Assumption is correct and the reader is aware <strong>of</strong> the gap <strong>in</strong> the SRC, no<br />
difference <strong>in</strong> read<strong>in</strong>g time is predicted between the demonstrative condition and the one<br />
without zheige. As for the ORC, <strong>in</strong>sert<strong>in</strong>g the passivization marker should <strong>in</strong>crease the<br />
difficulty <strong>in</strong> the RC region <strong>in</strong> case the Storage Cost predictions are correct. The reason<br />
is that more syntactic heads would be stored <strong>in</strong> memory when an RC is predicted and<br />
not a ma<strong>in</strong> clause as assumed by Hsiao and Gibson. In short, under Hsiao and Gibson’s<br />
assumptions the zheige condition should have no effect, whereas the bei condition would<br />
<strong>in</strong>crease the difficulty on the ORC. The results showed an overall subject preference<br />
<strong>in</strong> the total read<strong>in</strong>g times. The preference was ma<strong>in</strong>ly found on the relativizer and<br />
the head noun (de N2). No effect was found on the RC region before the relativizer.<br />
This is consistent with L<strong>in</strong> and Bever (2006a). On the relativizer (and on the preced<strong>in</strong>g<br />
region, but not significantly) the SRC condition without the determ<strong>in</strong>er (SR-no-det) was<br />
easier than the condition with the determ<strong>in</strong>er (SR-det). This is not consistent with the<br />
Gap Assumption but with the Elided Subject Assumption. The Storage Cost hypothesis<br />
under the Elided Subject Assumption would predict less syntactic heads <strong>in</strong> the SR-no-det<br />
condition, s<strong>in</strong>ce not an RC but a ma<strong>in</strong> clause is expected. This confirms the prediction<br />
<strong>of</strong> the corpus frequency data. The <strong>in</strong>itial mis<strong>in</strong>terpretation <strong>in</strong> the SR-no-det condition<br />
receives support by another f<strong>in</strong>d<strong>in</strong>g: On the regions follow<strong>in</strong>g the head noun an <strong>in</strong>creased<br />
difficulty was observed <strong>in</strong> the SR-no-det condition. This could be caused by a reanalysis<br />
due to the mis<strong>in</strong>terpretation as a ma<strong>in</strong> clause. A difference between the OR-bei and<br />
OR-no-bei conditions was only observed on the relativizer. In particular, the no-bei<br />
condition was harder. This po<strong>in</strong>ts to a reanalysis process due to <strong>in</strong>itially mis<strong>in</strong>terpret<strong>in</strong>g<br />
the ORC as ma<strong>in</strong> clause. In spite <strong>of</strong> that, no effects po<strong>in</strong>t<strong>in</strong>g to a misanalysis were<br />
observed <strong>in</strong> the pre-relativizer region.<br />
To summarize, the ORC data <strong>of</strong> Kuo and Vasishth’s study do not support the Storage<br />
Cost Account, s<strong>in</strong>ce be<strong>in</strong>g aware <strong>of</strong> an ORC follow<strong>in</strong>g did not <strong>in</strong>crease difficulty <strong>in</strong><br />
the RC region. However, evidence for an <strong>in</strong>itial mis<strong>in</strong>terpretation was found <strong>in</strong> the<br />
reanalysis effect on the relativizer. SRC data supports a Storage Cost Account only<br />
under the Elided Subject Assumption, mean<strong>in</strong>g that readers <strong>in</strong>terpret SRCs <strong>in</strong>itially as<br />
ma<strong>in</strong> clauses with an elided subject. Altogether a subject preference effect was shown<br />
which was located on the relativizer and head noun region like <strong>in</strong> L<strong>in</strong> and Bever (2006a).<br />
36
2.4 The RC Extraction Preference <strong>in</strong> Mandar<strong>in</strong><br />
Qiao and Forster (2008)<br />
The divergent results from the studies discussed so far are not necessarily contradictory<br />
because no exactly opposite effects were found <strong>in</strong> comparable regions. Rather some<br />
studies found effects <strong>in</strong> places were others did not. Throughout the studies, a subject<br />
preference was only found at the relativizer and head noun region while an object preference<br />
was ma<strong>in</strong>ly observed on the region before the relativizer <strong>in</strong> the embedded RC. Qiao<br />
and Forster (2008) claim both f<strong>in</strong>d<strong>in</strong>gs to be consistent. They argue that the readers<br />
<strong>in</strong> Hsiao and Gibson’s and Kuo and Vasishth’s experiments adopted different strategies<br />
that lead to the contradict<strong>in</strong>g results. Follow<strong>in</strong>g Qiao and Forster two strategies are<br />
possible <strong>in</strong> SPR experiments: a) a “wait-and-see” strategy where readers do not commit<br />
to a specific structure <strong>in</strong> the early sentence and b) a more careful process<strong>in</strong>g <strong>of</strong> the RC.<br />
<strong>Effects</strong> that may be delayed <strong>in</strong> the first strategy would show up <strong>in</strong> the second. In particular,<br />
<strong>in</strong> the careful strategy the SRC structure should be recognized, caus<strong>in</strong>g higher<br />
difficulty. This would, on the other hand, decrease the difficulty at the relativizer and<br />
the head noun because no reanalysis takes place. Qiao and Forster used the Maze Task 2<br />
(Forster, Guerrera, and Elliot, 2008), which forced readers to adopt the more careful<br />
strategy. The results showed an object preference <strong>in</strong> the relative clause (as <strong>in</strong> Hsiao<br />
and Gibson, 2003) and a subject preference at the relativizer (consistent with Kuo and<br />
Vasishth, 2007). Overall a slight advantage for the ORC was found. The explanation<br />
<strong>of</strong>fered by Qiao and Forster is that <strong>in</strong> <strong>in</strong> Kuo and Vasishth’s study readers adopted<br />
the wait-and-see strategy which avoided the difficulties <strong>in</strong> the SRC. This means that<br />
readers do not really predict a ma<strong>in</strong> clause <strong>in</strong> the onl<strong>in</strong>e-read<strong>in</strong>g <strong>of</strong> an SRC as stated<br />
by the Elided Subject Assumption; readers rather do not make any prediction at all.<br />
Consider<strong>in</strong>g that the subject preference on the relativizer was not significant by items,<br />
the results are consistent with Hsiao and Gibson’s data. It is, however, not clear why<br />
the participants <strong>in</strong> Hsiao and Gibson’s study adopted a different pars<strong>in</strong>g strategy than<br />
the readers <strong>in</strong> Kuo and Vasishth’s study, when both used the same method.<br />
Summary<br />
Many issues <strong>of</strong> Hsiao and Gibson’s <strong>in</strong>itial study were subject to critique. However,<br />
subsequent results were equally distributed between object and subject preference. See<br />
table 2.4 for a summary <strong>of</strong> the results ga<strong>in</strong>ed for the RC extraction preference <strong>in</strong> Ch<strong>in</strong>ese<br />
Mandar<strong>in</strong>. The table also shows which <strong>of</strong> the discussed theories are consistent with the<br />
studies. Storage Cost under the Elided Subject Assumption (SC+ES) is not consistent<br />
with any study. Recall that the theory would predict a no effect <strong>in</strong> the RC region and an<br />
object advantage on the relativizer and head noun. This does not fit any <strong>of</strong> the empirical<br />
f<strong>in</strong>d<strong>in</strong>gs. The Active Filler Strategy is omitted <strong>in</strong> the table because it does not make<br />
2 In the Maze Task the reader has to choose between two words at each po<strong>in</strong>t <strong>in</strong> the sentence. Only<br />
one <strong>of</strong> the two words is a grammatical cont<strong>in</strong>uation. The reader is thus forced to predict a complete<br />
structure.<br />
37
clear predictions for head-f<strong>in</strong>al RCs.<br />
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
HG03 LB06 LG07 KV07 QF08<br />
Preference O S O S O (S)<br />
Region<br />
Canonicity<br />
IC<br />
SC+GA<br />
Access.<br />
Expectation<br />
Persp.+ES<br />
Frequency<br />
RC<br />
√<br />
√<br />
√<br />
de N<br />
√<br />
√<br />
√<br />
√<br />
RC<br />
√<br />
√<br />
√<br />
de N<br />
√<br />
√<br />
√<br />
√<br />
RC (de)<br />
√<br />
√<br />
√<br />
√<br />
SC+ES<br />
<strong>Experience</strong> ( √ ) ( √ ) ( √ )<br />
Table 2.4: Studies <strong>of</strong> the RC extraction preference <strong>in</strong> Mandar<strong>in</strong> and their consistency<br />
with discussed theories. HG03 = Hsiao and Gibson (2003), LB06 = L<strong>in</strong> and Bever<br />
(2006a), LG07 = L<strong>in</strong> and Garnsey (2007), KV07 = Kuo and Vasishth (2007), and<br />
QF08 = Qiao and Forster (2008). IC = Integration Cost, SC = Storage Cost, GA =<br />
Gap Assumption, and ES = Elided Subject Assumption.<br />
I turn now to the second phenomenon to be addressed <strong>in</strong> this thesis: <strong>Effects</strong> <strong>of</strong> forgett<strong>in</strong>g<br />
while the process<strong>in</strong>g <strong>of</strong> complex nested structures.<br />
2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />
2.5.1 The Grammaticality Illusion<br />
Complex nested structures like center-embedd<strong>in</strong>g relative clauses are very difficult to<br />
process. Grammaticality rat<strong>in</strong>g studies show that these structures are <strong>of</strong>ten judged as<br />
ungrammatical. Memory-<strong>based</strong> theories (Gibson, 1998; 2000; Just and Carpenter, 1992;<br />
Lewis and Vasishth, 2005; Lewis et al., 2006) expla<strong>in</strong> this by the excessive capacity load<br />
evoked by a number <strong>of</strong> unbounded dependencies that have to be held <strong>in</strong> memory. The<br />
DLT (Gibson, 2000) predicts pars<strong>in</strong>g slow-downs due to storage <strong>of</strong> complex predictions<br />
and decay processes <strong>in</strong> distant dependencies. Capacity limitations are commonly seen<br />
as cross-l<strong>in</strong>guistic constra<strong>in</strong>ts that underly all sorts <strong>of</strong> language process<strong>in</strong>g. Hence the<br />
predictions <strong>of</strong> memory-<strong>based</strong> theories are language-<strong>in</strong>dependent. However, a study by<br />
Vasishth et al. (2008) casts doubt on that claim’s validity. Their experiment suggests<br />
that the robustness <strong>of</strong> memorized representations and related decay effects may well be<br />
dependent upon language-specific grammatical properties. The experiment concerned<br />
38
2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />
the so-called grammaticality illusion <strong>of</strong> ungrammatical center-embedd<strong>in</strong>g structures.<br />
Example (15) shows a sentence pair discussed <strong>in</strong> Frazier (1985). (15a) is a grammatical<br />
sentence conta<strong>in</strong><strong>in</strong>g a doubly embedded ORC. The center-embedd<strong>in</strong>g produces<br />
three consecutive verb phrases (VPs) complet<strong>in</strong>g the three clauses from the <strong>in</strong>nermost<br />
to the outermost. In (15b) the second verb phrase was clean<strong>in</strong>g every week is dropped,<br />
what makes the sentence ungrammatical. I will call the condition <strong>in</strong> (15b) the drop-V2<br />
condition.<br />
(15) a. The apartment that the maid who the service had sent over was clean<strong>in</strong>g<br />
every week was well decorated.<br />
b. * The apartment that the maid who the service had sent over was well<br />
decorated. 3<br />
The surpris<strong>in</strong>g observation (attributed to Janet Fodor) was that the ungrammatical<br />
sentence (15a) to English readers does not only appear grammatical; but most readers<br />
judge it even better than the grammatical version <strong>of</strong> the sentence. The f<strong>in</strong>d<strong>in</strong>g ga<strong>in</strong>ed<br />
support by an acceptability rat<strong>in</strong>g study by Gibson and Thomas (1999). For their<br />
study Gibson and Thomas used stimuli that either conta<strong>in</strong>ed all three or missed one<br />
<strong>of</strong> the three VPs. The results showed that miss<strong>in</strong>g out the second VP causes readers<br />
to rate the sentence as acceptable as the grammatical one conta<strong>in</strong><strong>in</strong>g all VPs. A further<br />
rat<strong>in</strong>g experiment conducted by Christiansen and MacDonald (1999) even showed<br />
a higher acceptability for the drop-V2 condition than for the grammatical condition.<br />
The qualitative difference to Gibson and Thomas’ study is expla<strong>in</strong>able by the method<br />
Christiansen and MacDonald used. They carried out a so-called “stop-mak<strong>in</strong>g-sense”<br />
task, which is self-paced word-by-word read<strong>in</strong>g with periodic request for a grammaticality<br />
rat<strong>in</strong>g. The SPR task prevents the participants from re-read<strong>in</strong>g the sentence. This<br />
k<strong>in</strong>d <strong>of</strong> quasi-onl<strong>in</strong>e measure may be the cause for a lower rat<strong>in</strong>g <strong>of</strong> the grammatical<br />
but complex center-embedd<strong>in</strong>g. As an explanation for the grammaticality illusion Gibson<br />
and Thomas (1999) propose that the high memory load causes the reader to forget<br />
the second NP (the maid) and with it the prediction <strong>of</strong> the second VP (was clean<strong>in</strong>g<br />
every week). Gibson and Thomas basically <strong>of</strong>fer two hypotheses: a) the high memory<br />
cost prun<strong>in</strong>g hypothesis and b) the recency/primacy account. The two approaches<br />
are restated by Vasishth et al. (2008) as the VP-forgett<strong>in</strong>g Hypothesis and the<br />
NP-forgett<strong>in</strong>g Hypothesis, respectively.<br />
a) The VP-forgett<strong>in</strong>g Hypothesis The orig<strong>in</strong>al High Memory Cost Prun<strong>in</strong>g Hypothesis<br />
rests on the assumptions <strong>of</strong> SPLT (Gibson, 1998), the predecessor <strong>of</strong> DLT. The<br />
major proposition as stated by Gibson and Thomas (1999) is the follow<strong>in</strong>g:<br />
(16) The high memory cost prun<strong>in</strong>g hypothesis:<br />
At po<strong>in</strong>ts <strong>of</strong> high memory complexity, forget the syntactic prediction(s) with the<br />
most memory load.<br />
3 By convention an asterisk (*) <strong>in</strong>dicates ungrammaticality <strong>of</strong> a sentence.<br />
39
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
Accord<strong>in</strong>g to Gibson and Thomas, exceed<strong>in</strong>g a theoretic memory capacity limit by excessive<br />
load causes a loss <strong>of</strong> costly predictions. A successful parse is possible as long as<br />
memory demands throughout the sentence stay with<strong>in</strong> a certa<strong>in</strong> capacity range. However,<br />
when high complexity causes the load to exceed the limit, a breakdown <strong>of</strong> the<br />
parser has to be prevented by prun<strong>in</strong>g activation. In the sense <strong>of</strong> the discrete nature <strong>of</strong><br />
SPLT this means that the prediction <strong>of</strong> certa<strong>in</strong> syntactic categories have to be dropped.<br />
The prun<strong>in</strong>g hypothesis assumes that the predictions to be forgotten are those caus<strong>in</strong>g<br />
the biggest part <strong>of</strong> SPLT memory cost at the current po<strong>in</strong>t <strong>in</strong> the sentence. In example<br />
(17) the po<strong>in</strong>t <strong>of</strong> highest memory cost is the deepest embedded subject the cl<strong>in</strong>ic (NP3).<br />
At this po<strong>in</strong>t two predictions are held <strong>in</strong> memory: VP2 predicted by NP2 and VP3<br />
predicted by NP3. S<strong>in</strong>ce VP2 is further up <strong>in</strong> the sentence and has to be held longer<br />
<strong>in</strong> memory than the successive VP3, it causes more memory cost. Consequently, the<br />
prediction <strong>of</strong> the second VP gets pruned and therewith forgotten.<br />
(17) a. [The patient]NP1 whoi [the nurse]NP2 whoj [the cl<strong>in</strong>ic]NP3 [had hired ej ]VP3<br />
[admitted ei]VP2 [met Jack]VP1 .<br />
b. * [The patient]NP1 whoi [the nurse]NP2 whoj [the cl<strong>in</strong>ic]NP3 [had hired ej ]VP3<br />
[met Jack]VP1 .<br />
Vasishth et al. (2008) restate the prun<strong>in</strong>g hypothesis <strong>in</strong> terms <strong>of</strong> decay as def<strong>in</strong>ed <strong>in</strong><br />
the DLT (Gibson, 2000) and refer to it as the VP-forgett<strong>in</strong>g Hypothesis. Vasishth et al.<br />
calculate Integration and Storage Cost at the three VPs to determ<strong>in</strong>e the “po<strong>in</strong>t <strong>of</strong> greatest<br />
difficulty” <strong>in</strong> the sentence. The DLT cost predictions for example (17) are illustrated<br />
<strong>in</strong> figure 2.5. At the first VP (VP3) two <strong>in</strong>tegrations take place. The object the nurse<br />
with two <strong>in</strong>terven<strong>in</strong>g discourse referents (cl<strong>in</strong>ic and hired) and the subject the cl<strong>in</strong>ic with<br />
one <strong>in</strong>terven<strong>in</strong>g discourse referent (hired) are <strong>in</strong>tegrated. At this moment there are two<br />
active predictions held <strong>in</strong> memory: the predicate <strong>of</strong> the upper RC (admitted), caused<br />
by read<strong>in</strong>g nurse, and the ma<strong>in</strong> verb. This makes a total cost <strong>of</strong> 4. At the second verb<br />
(admitted) the object the patient and the subject the nurse are <strong>in</strong>tegrated. The patient<br />
has a distance <strong>of</strong> four discourse referents (nurse, cl<strong>in</strong>ic, hired, and admitted) from the<br />
verb, the object nurse is separated by two, and just the matrix verb is predicted. This<br />
makes a total memory cost <strong>of</strong> 8 at the VP2 site. F<strong>in</strong>ally, on the third VP, by <strong>in</strong>tegrat<strong>in</strong>g<br />
the patient and predict<strong>in</strong>g a direct object, a cost <strong>of</strong> 6 is ga<strong>in</strong>ed. Conclud<strong>in</strong>g from the<br />
calculations, VP2 has the highest memory cost and, hence, is forgotten.<br />
The difference between Vasishth et al.’s and Gibson and Thomas’ account is that<br />
the latter added Storage Cost on the noun and Integration Cost <strong>of</strong> the predicted verb,<br />
whereas Vasishth et al. just use the total cost on the verb. The predictions, however, are<br />
the same. Let me try to reformulate the decay approach more <strong>in</strong>tuitively. The important<br />
measure <strong>of</strong> the decay approach is Integration Cost. By count<strong>in</strong>g the number <strong>of</strong> <strong>in</strong>terven<strong>in</strong>g<br />
discourse referents it is a discrete <strong>in</strong>direct measure <strong>of</strong> time. Or, as Vasishth et al. put<br />
it: it is “a discretized abstraction over some activation decay function that determ<strong>in</strong>es<br />
the strength <strong>of</strong> a memorial representation.” Hence, decay could be described as a function<br />
<strong>of</strong> time and <strong>in</strong>terven<strong>in</strong>g memory load with the assumption that a high memory load<br />
40
2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />
The patient who the nurse who the cl<strong>in</strong>ic. . .<br />
had hired admitted met Jack.<br />
C(I) 2+1 4+3 5<br />
C(S) 2 1 1<br />
Total 4 8 6<br />
Figure 2.5: DLT memory cost for the three VPs <strong>in</strong> a doubly embedded ORC.<br />
<strong>in</strong>creases the speed <strong>of</strong> decay. In our example, due to non-<strong>in</strong>tegrated discourse referents<br />
high memory load appears after the source <strong>of</strong> the VP2 prediction (nurse). That results<br />
<strong>in</strong> a steeper slope <strong>of</strong> the decay function, caus<strong>in</strong>g the representation <strong>of</strong> the VP2 prediction<br />
to fall below a certa<strong>in</strong> threshold. The forgett<strong>in</strong>g <strong>of</strong> the VP2 prediction would account<br />
for the good rat<strong>in</strong>g <strong>of</strong> the ungrammatical condition <strong>in</strong> the follow<strong>in</strong>g way. The miss<strong>in</strong>g<br />
<strong>of</strong> the VP <strong>in</strong> the drop-V2 condition stays unnoticed and causes no surprise. In addition<br />
the distance between VP1 and its dependent is smaller, which facilitates retrieval. In<br />
the grammatical condition, on the other hand, the occurrence <strong>of</strong> the unpredicted VP2<br />
causes pars<strong>in</strong>g failure. Compar<strong>in</strong>g both conditions <strong>in</strong> a read<strong>in</strong>g time study should show<br />
differences at the matrix verb and the follow<strong>in</strong>g region.<br />
b) The NP-forgett<strong>in</strong>g Hypothesis Gibson and Thomas (1999) and Vasishth et al.<br />
(2008) additionally mention a possible serial order effect on ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g several NPs <strong>in</strong><br />
memory. Evidence from cross-doma<strong>in</strong> studies on human short-term memory (e.g. Henson,<br />
1998; Baddeley, 1997; Lewis, 1996) shows a recency/primary preference, mak<strong>in</strong>g<br />
most recent and earliest items easier to recall. This suggests that the representational<br />
strength <strong>of</strong> memorized items exhibits a U-shaped pattern with respect to their serial<br />
order, mak<strong>in</strong>g middle items harder to ma<strong>in</strong>ta<strong>in</strong> than the rest. Assum<strong>in</strong>g that a recency/primacy<br />
preference applies to the memoriz<strong>in</strong>g <strong>of</strong> noun phrases, this account leads to<br />
the NP-forgett<strong>in</strong>g Hypothesis as follows. High memory load causes the lowest activated<br />
middle NP (NP2) to be forgotten. This results <strong>in</strong> retrieval failure at VP2 <strong>in</strong> the grammatical<br />
condition, whereas <strong>in</strong> the drop-V2 condition no retrieval can be triggered, s<strong>in</strong>ce<br />
the respective verb is miss<strong>in</strong>g. A a result the the grammatical sentence is perceived more<br />
difficult. The effects <strong>of</strong> the NP-forgett<strong>in</strong>g Hypothesis should also occur on the matrix<br />
verb and beyond.<br />
English<br />
Vasishth et al. (2008) assessed NP-forgett<strong>in</strong>g and VP-forgett<strong>in</strong>g measur<strong>in</strong>g onl<strong>in</strong>e read<strong>in</strong>g<br />
time <strong>in</strong> SPR and eyetrack<strong>in</strong>g. Besides the grammaticality control NP-similarity contrasts<br />
like <strong>in</strong> example (18) were used to detect NP-forgett<strong>in</strong>g effects. In (18a) all three NPs<br />
are highly confusable whereas <strong>in</strong> (18b) the second noun is <strong>in</strong>animate, which reduces the<br />
similarity. Follow<strong>in</strong>g Vasishth et al., high similarity (18a) predicts encod<strong>in</strong>g <strong>in</strong>terference<br />
41
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
at NP3 and storage and retrieval <strong>in</strong>terference at VP3. When the NP2 is not forgotten,<br />
<strong>in</strong>terference effects should also be seen at the rest <strong>of</strong> the sentence <strong>in</strong> the high-<strong>in</strong>terference<br />
condition. However, assum<strong>in</strong>g that the representation <strong>of</strong> the second NP has decayed<br />
latest at the middle verb, no further <strong>in</strong>terference effect should occur. Consequently,<br />
the NP-forgett<strong>in</strong>g Hypothesis predicts that differences between the high <strong>in</strong>terference<br />
condition (18a) and the low <strong>in</strong>terference condition (18b) should disappear after the first<br />
verb.<br />
(18) a. The carpenter who the craftsman that the peasant carried hurt supervised<br />
the apprentice. (high-<strong>in</strong>terference)<br />
b. The carpenter who the pillar that the peasant carried hurt supervised the<br />
apprentice. (low-<strong>in</strong>terference)<br />
The results <strong>of</strong> English SPR and eyetrack<strong>in</strong>g showed an exist<strong>in</strong>g but hardly significant<br />
support for the NP-forgett<strong>in</strong>g Hypothesis. Although be<strong>in</strong>g non-significant, there was a<br />
clear numerical read<strong>in</strong>g time effect <strong>of</strong> similarity-<strong>based</strong> <strong>in</strong>terference, which disappeared at<br />
V2, po<strong>in</strong>t<strong>in</strong>g to a forgett<strong>in</strong>g <strong>of</strong> NP2, which reduced the <strong>in</strong>terference. The VP-forgett<strong>in</strong>g<br />
Hypothesis was fully confirmed, as can be seen <strong>in</strong> figure 2.6. The drop-V2 condition was<br />
significantly faster at the ma<strong>in</strong> verb and the follow<strong>in</strong>g region, suggest<strong>in</strong>g a forgett<strong>in</strong>g <strong>of</strong><br />
the VP2 prediction and possibly additional difficulty at the ma<strong>in</strong> verb <strong>in</strong> the grammatical<br />
condition. In eyetrack<strong>in</strong>g there was also a surpris<strong>in</strong>g drop-V2 facilitation effect on the<br />
first verb (V3) not supported by the forgett<strong>in</strong>g hypothesis. This, however, is expla<strong>in</strong>ed<br />
by Vasishth and colleagues as an artifact <strong>of</strong> complexity-<strong>in</strong>duced re-read<strong>in</strong>g behavior <strong>in</strong><br />
the grammatical condition.<br />
German<br />
Identical experiments as laid out above were also carried out by Vasishth et al. (2008)<br />
<strong>in</strong> German. An example stimuli pair for the grammaticality manipulation is shown <strong>in</strong><br />
example (19). The result<strong>in</strong>g structure <strong>of</strong> German ORC double-embedd<strong>in</strong>g is identical to<br />
the one <strong>in</strong> English except for the commas. The comma issue will be addressed after the<br />
study results for German have been presented. The <strong>in</strong>vestigation <strong>of</strong> the NP-forgett<strong>in</strong>g<br />
Hypothesis yielded analogical results to the English study. Surpris<strong>in</strong>gly however, the<br />
VP-forgett<strong>in</strong>g-Hypothesis was not confirmed. On the contrary it was the grammatical<br />
condition that showed faster read<strong>in</strong>g time at V1 and post-V1 <strong>in</strong> both SPR and eyetrack<strong>in</strong>g.<br />
The eyetrack<strong>in</strong>g results for the German grammaticality manipulation are shown <strong>in</strong><br />
figure 2.7.<br />
(19) a. Der Anwalt, den der Zeuge, den der Spion betrachtete, schnitt, überzeugte<br />
den Richter. (grammatical)<br />
b. Der Anwalt, den der Zeuge, den der Spion betrachtete, überzeugte den<br />
Richter. (drop-V2)<br />
42
2.5 Forgett<strong>in</strong>g <strong>Effects</strong> SHORT-TERM FORGETTING 18<br />
Read<strong>in</strong>g time [ms]<br />
400 600 800 1000 1200 1400<br />
Experiment 2 (English Eyetrack<strong>in</strong>g): grammaticality<br />
grammatical<br />
ungrammatical<br />
V3 V2 V1 Post!V1<br />
Region<br />
Figure Figure 6. 2.6: Mean Effect read<strong>in</strong>g <strong>of</strong> the times grammaticality and 95% confidence manipulation <strong>in</strong>tervals for <strong>in</strong>the theverb English and post-verbal eyetrack<strong>in</strong>g regions study<br />
<strong>in</strong>by theVasishth English eyetrack<strong>in</strong>g et al. (2008) study (experiment 2). 2, The p. 18): figureMean shows read<strong>in</strong>g the effecttimes <strong>of</strong> the grammaticality<br />
and 95% confi-<br />
manipulation. dence <strong>in</strong>tervals for the verbs and post-verbal regions.<br />
addition, German<strong>in</strong>readers the SPRdodata not seem we found to forget weak the evidence VP prediction. consistent with In fact thethey NP-forgett<strong>in</strong>g seem to notice hy-<br />
the pothesis, ungrammaticality but <strong>in</strong> the eyetrack<strong>in</strong>g <strong>of</strong> the drop-V2 data wecondition, found stronger which evidence leads to for<strong>in</strong>creased NP forgett<strong>in</strong>g. read<strong>in</strong>g time.<br />
The surpris<strong>in</strong>g result po<strong>in</strong>ts to the assumption that l<strong>in</strong>guistic memory processes are not<br />
One surpris<strong>in</strong>g result <strong>in</strong> the eyetrack<strong>in</strong>g data was the shorter re-read<strong>in</strong>g time at V3<br />
language-<strong>in</strong>dependent but rather affected by language-specific grammatical properties.<br />
<strong>in</strong> the ungrammatical condition. However, this effect can be expla<strong>in</strong>ed. Re-read<strong>in</strong>g time<br />
The head-f<strong>in</strong>ite nature <strong>of</strong> German (SOV) subord<strong>in</strong>ate clauses causes verbs to appear<br />
is a function <strong>of</strong> revisits or regressions to regions that have already been viewed dur<strong>in</strong>g the<br />
clause-f<strong>in</strong>ally more frequently than it is the case <strong>in</strong> English, an SVO language. An ob-<br />
first pass. Given that regressions are more frequent <strong>in</strong> complex sentences (where complexity<br />
jection<br />
is def<strong>in</strong>ed<br />
might<br />
as <strong>in</strong>creased<br />
be, that commas<br />
ambiguity<br />
<strong>in</strong><br />
(Clifton<br />
German<br />
et<br />
facilitate<br />
al., <strong>in</strong> press)<br />
the recognition<br />
or any other<br />
<strong>of</strong><br />
k<strong>in</strong>d<br />
a completed<br />
<strong>of</strong> <strong>in</strong>tegration<br />
clause.<br />
Adifficulty), double-embedd<strong>in</strong>g and given that <strong>in</strong>volves the ungrammatical a comma aftersentences each embedded are predicted ma<strong>in</strong> to verb. be less Vasishth complex and<br />
colleagues overall, it addressed is not surpris<strong>in</strong>g this issue that byre-read<strong>in</strong>g a fifth experiment time is shorter <strong>in</strong>volv<strong>in</strong>g at V3 English <strong>in</strong> thesentences ungrammatical enriched<br />
with condition. commas. However, the comma <strong>in</strong>clusion did not show any effect. Nevertheless, as<br />
Vasishth et al. note, this result does not exclude the possibility <strong>of</strong> a comma-<strong>based</strong> facil-<br />
Another surpris<strong>in</strong>g result <strong>in</strong> the eyetrack<strong>in</strong>g data was the presence <strong>of</strong> the <strong>in</strong>terference<br />
itation. An important fact is that German readers are tra<strong>in</strong>ed on us<strong>in</strong>g commas, while<br />
effect <strong>in</strong> the Post-V1 region. If the second NP is forgotten by the time that V1 is processed,<br />
English readers are not, which suggests that commas were <strong>of</strong> no use for the English<br />
then it should not cause any <strong>in</strong>terference <strong>in</strong> subsequent regions. There are two possible<br />
participants.<br />
<strong>in</strong>terpretations <strong>of</strong> this pattern. One is that the second NP is not <strong>in</strong> fact forgotten, and<br />
the<br />
I will<br />
second<br />
now<br />
is<br />
<strong>in</strong>vestigate<br />
that the reappearance<br />
potential explanations<br />
<strong>of</strong> the <strong>in</strong>terference<br />
for a language-specific<br />
effect is an artefact<br />
forgett<strong>in</strong>g<br />
<strong>of</strong> process<strong>in</strong>g<br />
effect.<br />
difficulty dur<strong>in</strong>g earlier process<strong>in</strong>g. We defer discussion <strong>of</strong> this question until f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong> the<br />
rema<strong>in</strong><strong>in</strong>g experiments are presented.<br />
We turn our attention next to the German experiments.<br />
43
Read<strong>in</strong>g time [ms]<br />
400 600 800 1000 1200<br />
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
SHORT-TERM FORGETTING 24<br />
Experiment 4 (German Eyetrack<strong>in</strong>g): grammaticality<br />
grammatical<br />
ungrammatical<br />
V3 V2 V1 Post!V1<br />
Region<br />
Figure 10. 2.7: Mean Effect read<strong>in</strong>g <strong>of</strong> the times grammaticality and 95% confidence manipulation <strong>in</strong>tervals for<strong>in</strong>the the critical German regions eyetrack<strong>in</strong>g <strong>in</strong> the German study<br />
eyetrack<strong>in</strong>g by Vasishth study et(experiment al. (2008) 4). (experiment The figure shows 4, p. the 24): effect Mean <strong>of</strong> grammaticality. read<strong>in</strong>g times and 95% confidence<br />
<strong>in</strong>tervals for the verbs and post-verbal regions.<br />
As <strong>in</strong> the German SPR study, the matrix verb V1 <strong>in</strong> the ungrammatical condition<br />
had longer read<strong>in</strong>g time than V1 <strong>in</strong> the grammatical condition. The read<strong>in</strong>g time at the<br />
2.5.2 word follow<strong>in</strong>g Expla<strong>in</strong><strong>in</strong>g V1 showed the no significant Forgett<strong>in</strong>g difference. Effect This result is <strong>in</strong>consistent with the VP-<br />
Capacity forgett<strong>in</strong>g hypothesis and consistent with the hypothesis that the middle verb’s prediction is<br />
not forgotten. Not much evidence was found for the NP-forgett<strong>in</strong>g evidence: no <strong>in</strong>terference<br />
Just effectand wasCarpenter seen <strong>in</strong> the (1992) regions explicitly preced<strong>in</strong>g mention V1, andthe atpossibility V1 the high-<strong>in</strong>terference <strong>of</strong> forgett<strong>in</strong>g certa<strong>in</strong> condition predictions<br />
showed, surpris<strong>in</strong>gly, <strong>in</strong> the CC-READER a lower read<strong>in</strong>g model time as compared “Forgett<strong>in</strong>g to by the displacement”. low-<strong>in</strong>terference condition. The underly<strong>in</strong>g<br />
mechanism Hav<strong>in</strong>gispresented equal to the prun<strong>in</strong>g experimental hypothesis results, <strong>of</strong> weGibson turn toand the Thomas <strong>in</strong>terpretation (1999), <strong>of</strong> the thusf<strong>in</strong>d mak<strong>in</strong>g<strong>in</strong>gs.<br />
basically the same predictions as the DLT-<strong>based</strong> approach described above. As has<br />
just been laid out, the VP-forgett<strong>in</strong>g Hypothesis is seem<strong>in</strong>gly not cross-l<strong>in</strong>guistically<br />
valid. Specifically, the hypothesis<br />
General<br />
has been<br />
Discussion<br />
confirmed for English but disconfirmed for<br />
German. We discuss An memory-<strong>based</strong> the NP- and explanation VP-forgett<strong>in</strong>gcould hypotheses account separately, for the language-specific beg<strong>in</strong>n<strong>in</strong>g with the difference<br />
former. <strong>in</strong>In two the ways: Englisheither experiments by postulat<strong>in</strong>g 1 and 1a awe language-dependent found no evidence consistent capacity with limitthe or a<br />
language-dependent predictions <strong>of</strong> NP-forgett<strong>in</strong>g, robustness whereas <strong>of</strong> VP <strong>in</strong> the predictions. English eyetrack<strong>in</strong>g However, study theories (experiment like DLT2) regard and<br />
memory the German processes SPR study as universally (experimentvalid. 3) we Thus, did f<strong>in</strong>dconsider<strong>in</strong>g the predicted aneffects. applicability However, <strong>of</strong>evidence the DLT<strong>based</strong><br />
<strong>in</strong>consistent hypothesis with NP <strong>in</strong> both forgett<strong>in</strong>g languages was also could found: only<strong>in</strong> mean the German SPRreaders study possess (experiment a higher 3)<br />
memory a shorter capacity read<strong>in</strong>g time thanwas English found <strong>in</strong>readers. the Post-V1 However, region there for theshould high-<strong>in</strong>terference be evidence condition, for that<br />
capacity and <strong>in</strong> the difference eyetrack<strong>in</strong>g fromstudy read<strong>in</strong>g (experiment span tasks 4) aand, shorter consider<strong>in</strong>g re-read<strong>in</strong>g work<strong>in</strong>g time was memory found capacity <strong>in</strong> the<br />
as<br />
V1<br />
doma<strong>in</strong>-unspecific,<br />
region for the high-<strong>in</strong>terference<br />
there should<br />
condition<br />
also be<br />
(compared<br />
evidence <strong>in</strong><br />
to<br />
non-l<strong>in</strong>guistic<br />
the low-<strong>in</strong>terference<br />
work<strong>in</strong>g<br />
condition).<br />
memory-<br />
Given these mixed results, it is impossible to make a strong case for NP-forgett<strong>in</strong>g as the<br />
sole explanation for the forgett<strong>in</strong>g effect, particularly s<strong>in</strong>ce VP forgett<strong>in</strong>g (discussed next)<br />
44
2.5 Forgett<strong>in</strong>g <strong>Effects</strong><br />
related tasks. S<strong>in</strong>ce there is no such evidence for a language specific memory span, a<br />
pure DLT-<strong>based</strong> hypothesis cannot account for the non-existence <strong>of</strong> the forgett<strong>in</strong>g effect<br />
<strong>in</strong> German. There have to be additional factors that affect the robustness <strong>of</strong> the VP2<br />
prediction representation.<br />
The most promis<strong>in</strong>g explanation is that process<strong>in</strong>g is affected by certa<strong>in</strong> languagespecific<br />
grammatical properties. Vasishth et al. (2008) mention two possibilities how this<br />
could be come about. a) The robustness <strong>of</strong> the verb representation is directly specified by<br />
the same parameters that shape the grammar and hence the production-<strong>based</strong> corpus<br />
regularities. b) On the other hand, the more robust representation could be due to<br />
a more effective process<strong>in</strong>g caused by read<strong>in</strong>g skill, which is affected by the mentioned<br />
corpus regularities and not by the parameters directly. The first possibility is matched by<br />
an expectation-<strong>based</strong> account that directly depends on grammatical properties. Also a<br />
canonicity account would predict SOV structures to be easier <strong>in</strong> German than <strong>in</strong> English.<br />
The alternative <strong>of</strong> read<strong>in</strong>g skill is accounted for by an experience-<strong>based</strong> approach.<br />
Expectation<br />
A door to language-specific effects could be antilocality. Antilocality has been observed<br />
predom<strong>in</strong>antly <strong>in</strong> head-f<strong>in</strong>al languages like German (Konieczny, 2000) and H<strong>in</strong>di (Vasishth<br />
and Lewis, 2006b). The seem<strong>in</strong>g restrictedness <strong>of</strong> these effects to head-f<strong>in</strong>al languages<br />
has lead to the suggestion that the sentence-f<strong>in</strong>al verb <strong>in</strong> these languages is<br />
higher expected than <strong>in</strong> non-head-f<strong>in</strong>al languages. However, a recent study by Jaeger<br />
et al. (2008) shows antilocality effects <strong>in</strong> English, which is not head-f<strong>in</strong>al. The crossl<strong>in</strong>guistic<br />
explanation is that the expectation for a verb <strong>in</strong>creases with more <strong>in</strong>terven<strong>in</strong>g<br />
material. The longer the distance between the dependent and the expected head the<br />
less likely it becomes that even more adjunctive material will <strong>in</strong>tervene before the head.<br />
Additionally, <strong>in</strong> most cases the <strong>in</strong>terven<strong>in</strong>g material narrows the possible candidates for<br />
the head, which lowers surprisal even more. The fact that the associated speed-up at the<br />
verb shows slightly different patterns <strong>in</strong> English and German encourages an expectation<strong>based</strong><br />
account for a language-specific forgett<strong>in</strong>g effect. For example it is imag<strong>in</strong>able that<br />
<strong>in</strong> head-f<strong>in</strong>al languages the prediction is more precise regard<strong>in</strong>g the exact location <strong>of</strong> the<br />
verb, whereas <strong>in</strong> other languages head-f<strong>in</strong>ality is too rare to provide exact verb location<br />
statistics <strong>in</strong> that case.<br />
<strong>Experience</strong><br />
The robustness <strong>of</strong> representations could be shaped by experience. An experience-<strong>based</strong><br />
account assumes that the reader adapts process<strong>in</strong>g strategies to <strong>of</strong>ten occurr<strong>in</strong>g structures.<br />
In result, German readers should be more skilled on head-f<strong>in</strong>al structures than<br />
English readers. An explanation <strong>based</strong> on coarse-gra<strong>in</strong>ed corpus frequencies would be<br />
equivalent to a canonicity approach. German, be<strong>in</strong>g an SOV language exhibits more<br />
head-f<strong>in</strong>al structures than English, predict<strong>in</strong>g easier process<strong>in</strong>g. But earlier discussions<br />
45
Chapter 2 Issues <strong>in</strong> Relative Clause Process<strong>in</strong>g<br />
have hopefully made clear that these theories rest on weakly justified fundamental commitments.<br />
The more comprehensive approach pursued here is experience with as least<br />
as possible commitments with respect to gra<strong>in</strong> size and structural frames. In this sense<br />
a connectionist implementation <strong>in</strong> a model like MacDonald and Christiansen (2002) is<br />
promis<strong>in</strong>g to show a systematic difference <strong>in</strong> the performance on English and German<br />
double-center-embedd<strong>in</strong>g. In fact, Christiansen and Chater (1999) have shown that such<br />
a model, tra<strong>in</strong>ed on center-embedd<strong>in</strong>g and right-branch<strong>in</strong>g structures, shows better prediction<br />
performance <strong>in</strong> ungrammatical 2VP embedd<strong>in</strong>g than <strong>in</strong> the grammatical 3VP<br />
embedd<strong>in</strong>g (see chapter 3 for details). S<strong>in</strong>ce right-branch<strong>in</strong>g and center-embedd<strong>in</strong>g reflect<br />
the dependency structure <strong>in</strong> English SRCs and ORCs, respectively, Christiansen<br />
and Chater’s simulation shows that connectionist experience models can <strong>in</strong> fact account<br />
for forgett<strong>in</strong>g effects comparable to human data. But would this model show a different<br />
performance when tra<strong>in</strong>ed on a German grammar? This depends on the <strong>in</strong>volved word<br />
order regularities. In contrast to English German SRCs and ORCs both exhibit centerembedded<br />
dependencies and hence are verb-f<strong>in</strong>al. This is a considerable bias and should<br />
have an effect on the model’s performance.<br />
Anticipat<strong>in</strong>g the simulation results <strong>in</strong> chapter 4, the word order effects are present but<br />
weak. In contrast, the usage <strong>of</strong> commas seems to have a greater impact.<br />
46
Chapter 3<br />
<strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language<br />
Comprehension<br />
3.1 Structure and Learn<strong>in</strong>g<br />
<strong>Connectionist</strong> networks are prototypical exposure-<strong>based</strong> models. Or, more precisely,<br />
they are the implementation <strong>of</strong> non-committed exposure-<strong>based</strong> accounts. Non-committed<br />
should mean here accounts without any specific assumptions about structural levels or<br />
gra<strong>in</strong> sizes, nor about the l<strong>in</strong>k<strong>in</strong>g between corpus regularities and behavior. In literature<br />
there does not seem to be an agreement about what models to call connectionist. It<br />
must be mentioned that there are hybrid models that use parallel distributed activation<br />
spread<strong>in</strong>g between symbolic entities on the one hand (e.g. Just and Carpenter, 1992;<br />
Lewis and Vasishth, 2005), and there are connectionist models that use hand-designed<br />
architectures and local representations on the other hand (e.g. Dell et al., 2002; McClelland<br />
and Elman, 1984; Rohde, 2002). I am concerned here only with “fully connectionist<br />
models” us<strong>in</strong>g fully distributed representations and no pre-designed <strong>in</strong>ternal structur<strong>in</strong>g.<br />
The most important feature that dist<strong>in</strong>guishes a connectionist network model <strong>of</strong> that<br />
k<strong>in</strong>d from symbolic models is its architecturally constra<strong>in</strong>ed highly adaptive learn<strong>in</strong>g<br />
ability. <strong>Connectionist</strong> models are functional problem solv<strong>in</strong>g mach<strong>in</strong>es that, depend<strong>in</strong>g<br />
on the specific learn<strong>in</strong>g algorithm and certa<strong>in</strong> architectural properties, are able to f<strong>in</strong>d<br />
the optimal solution to any task representable as <strong>in</strong>put-output pairs. The design <strong>of</strong><br />
symbolic models mostly <strong>in</strong>volves many assumptions about the desired processes which<br />
are hard-coded <strong>in</strong>to the system. For example, it has to be specified how to categorize<br />
and represent the <strong>in</strong>put. A connectionist system on the other hand starts from zero<br />
without any presumptions. The structure <strong>of</strong> the <strong>in</strong>ternal <strong>in</strong>put representation is shaped<br />
dur<strong>in</strong>g the learn<strong>in</strong>g process depend<strong>in</strong>g on the task requirements. Obviously the <strong>in</strong>formation<br />
about the structure that l<strong>in</strong>guists annotate to word str<strong>in</strong>gs <strong>of</strong> natural language<br />
is already there <strong>in</strong> the pla<strong>in</strong> str<strong>in</strong>gs. Extract<strong>in</strong>g the underly<strong>in</strong>g structure <strong>of</strong> the <strong>in</strong>put<br />
requires <strong>in</strong>formation about sequential and temporal relations between <strong>in</strong>put chunks. For<br />
that reason time is an important component <strong>of</strong> cognitive tasks. In particular language<br />
transports highly-structured <strong>in</strong>formation while be<strong>in</strong>g entirely sequential. A memory <strong>of</strong><br />
earlier <strong>in</strong>put and the representation <strong>of</strong> temporal relations between <strong>in</strong>put chunks pro-<br />
47
OUTPUT<br />
PUT PLAN<br />
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
vides much contextual <strong>in</strong>formation help<strong>in</strong>g to <strong>in</strong>terpret current <strong>in</strong>put. The context <strong>of</strong><br />
an utterance has a great <strong>in</strong>fluence on ambiguity resolution and predictions <strong>of</strong> content.<br />
There have been someThere accounts are many <strong>of</strong> provid<strong>in</strong>g ways <strong>in</strong> connectionist which this can networks be with temporal<br />
representation which were accomplished, <strong>of</strong> explicit and nature. a number This posed <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g limits to the number and richness<br />
<strong>of</strong> representations. proposals Elman (1990) have appeared describes<strong>in</strong> athe simple literature way to (e.g. provide a connectionist<br />
network with memory, called Jordan, a simple 1986; recurrent Tank & network Hopfield, (SRN, 1987; figure 3.1). The hidden<br />
representations <strong>in</strong> the network Stornetta, are Hogg, copied & <strong>in</strong>to Huberman, a so-called1987; context layer, which <strong>in</strong>fluences<br />
the hidden representations Watrous <strong>in</strong> the next & Shastri, step through 1987; weighted Waibel, activation feed<strong>in</strong>g. This<br />
memory loop goes without Hanazawa, any explicit H<strong>in</strong>ton, representation Shikano, & <strong>of</strong> Lang, time1987; or relations between <strong>in</strong>put<br />
chunks. It is the iterative P<strong>in</strong>eda, procedure 1988; Williams <strong>of</strong> copy<strong>in</strong>g & Zipser, and back-feed<strong>in</strong>g 1988). One itself that produces<br />
temporal relations on an<strong>of</strong> implicit the most level. promis<strong>in</strong>g Becausewas every suggested copy <strong>of</strong> the by activation pattern has<br />
been <strong>in</strong>fluenced by all earlier Jordan copies, (1986). the Jordan contextual described memory a network reaches <strong>in</strong>to the “past” <strong>in</strong> a<br />
cont<strong>in</strong>uously graded way (shown over several <strong>in</strong> Figure <strong>in</strong>put1) steps. conta<strong>in</strong><strong>in</strong>g The <strong>in</strong>formation recurrent <strong>of</strong> earlier <strong>in</strong>put representations<br />
is still <strong>in</strong> the connections representation which aswere a trace, used but to associate newer <strong>in</strong>put a has more <strong>in</strong>fluential<br />
power. Elman (1990) writes: static pattern (a “Plan”) with a serially<br />
ordered output pattern (a sequence <strong>of</strong><br />
“In this account, “Actions”). memory isThe neither recurrent passive connections nor a separate allowsubsystem.<br />
One<br />
cannot properly speak the network’s <strong>of</strong> a memory hidden forunits sequences; to see that its own memory is <strong>in</strong>extricably<br />
bound up withprevious the rest output, <strong>of</strong> the process<strong>in</strong>g so that the mechanism.” subsequent<br />
behavior can be shaped by previous<br />
This very simple wayresponses. <strong>of</strong> memory These supply recurrent yields connections architecturally are determ<strong>in</strong>ed plausible<br />
properties that can abstractly what give be described the network asmemory. storage limitations, memory span or decay<br />
<strong>of</strong> memorized representations over time. These are properties explicitly accounted for <strong>in</strong><br />
symbolic models like ACT-R or CC-READER.<br />
rchitecture used by Jordan (1986).<br />
from output to state units are one-forxed<br />
weight <strong>of</strong> 1.0. Not all connections<br />
ach can be modified <strong>in</strong><br />
<strong>in</strong>g way. Suppose a<br />
own <strong>in</strong> Figure 2) is<br />
at the <strong>in</strong>put level by<br />
nits; call these Context<br />
units are also “hidden”<br />
se that they <strong>in</strong>teract<br />
with other nodes<br />
he network, and not the<br />
ld.<br />
that there is a<br />
nput to be processed,<br />
clock which regulates<br />
<strong>of</strong> the <strong>in</strong>put to the<br />
cess<strong>in</strong>g would then<br />
e follow<strong>in</strong>g sequence <strong>of</strong><br />
time t, the <strong>in</strong>put units<br />
first <strong>in</strong>put <strong>in</strong> the sequence. Each <strong>in</strong>put might be a s<strong>in</strong>gle scalar value or a vector,<br />
n the nature <strong>of</strong> the problem. The context units are <strong>in</strong>itially set to 0.5. 2 OUTPUT UNITS<br />
HIDDEN UNITS<br />
INPUT UNITS CONTEXT UNITS<br />
Figure 2. A simple recurrent network <strong>in</strong> which activations are<br />
Figure 3.1: copied Architecture from hidden <strong>of</strong> layer a simple to context recurrent layer network on a one-for-one (SRN, Elman, 1990). The<br />
solid l<strong>in</strong>ebasis, represents with fixed fixed weight one-to-one <strong>of</strong> 1.0. Dotted connections l<strong>in</strong>es represent to thetra<strong>in</strong>able context layer. Dashed l<strong>in</strong>es<br />
connections.<br />
represent tra<strong>in</strong>able connections.<br />
Both the <strong>in</strong>put<br />
ntext units activate the hidden units; and then the hidden units feed forward to<br />
48<br />
tion function used here bounds values between 0.0 and 1.0.<br />
Page 4
3.2 Recursion and Complexity<br />
The architecturally emergent nature <strong>of</strong> task solution is also <strong>of</strong>ten the subject <strong>of</strong> criticism<br />
concern<strong>in</strong>g connectionist models. The argument goes that the explanatory value <strong>of</strong><br />
connectionist models is low s<strong>in</strong>ce there is no way to know what the network has really<br />
learned. Hence, these models do not provide any <strong>in</strong>sight <strong>in</strong>to pars<strong>in</strong>g mechanisms or<br />
memory representations. Address<strong>in</strong>g that issue Elman (1990) proved that an SRN <strong>in</strong> a<br />
way learns the lexical classes as we use them. For that Elman tra<strong>in</strong>ed the network on a<br />
simple word prediction task that <strong>in</strong>volved simplified natural sentences as Noun-Verb or<br />
Noun-Verb-Noun sequences. The comparison <strong>of</strong> hidden unit activation vectors yielded<br />
a hierarchical cluster<strong>in</strong>g <strong>of</strong> words <strong>in</strong> natural classes like verbs and nouns which were<br />
sub-clustered by transitiveness or animateness, respectively. The categorial cluster<strong>in</strong>g<br />
<strong>of</strong> representations depends directly on the “behavioral” relations <strong>of</strong> the words. Words<br />
<strong>of</strong>ten occurr<strong>in</strong>g <strong>in</strong> similar environments result <strong>in</strong> similar representations <strong>in</strong> the hidden<br />
layer activation. When an <strong>in</strong>put word is replaced after tra<strong>in</strong><strong>in</strong>g by a new word, the<br />
network has never seen, it will <strong>in</strong>ternally represent that word <strong>in</strong> the same way as the<br />
replaced word s<strong>in</strong>ce it behaves <strong>in</strong> the same way. An advantage <strong>of</strong> distributed representations<br />
with a cont<strong>in</strong>uous activation range is that they theoretically provide <strong>in</strong>f<strong>in</strong>itive<br />
memory capacity. That, however, does not mean connectionist systems are able to store<br />
<strong>in</strong>f<strong>in</strong>ite equally treated representations like it would be the case <strong>in</strong> theoretical symbolic<br />
models with <strong>in</strong>f<strong>in</strong>ite capacity. Representations <strong>of</strong> higher gra<strong>in</strong> size, i.e. <strong>in</strong>volv<strong>in</strong>g<br />
a higher activation range, have more <strong>in</strong>fluential power with respect to the networks behavior<br />
than more f<strong>in</strong>e-gra<strong>in</strong>ed activation patterns do. So, what grades memory (and<br />
hence shapes memory capacity) is the relative importance <strong>of</strong> the stored representations.<br />
This relates to the type-token issue. As expla<strong>in</strong>ed above, connectionist networks are able<br />
to discern types like lexical categories, but they do not ascribe any mean<strong>in</strong>g to s<strong>in</strong>gle<br />
tokens. However, that does not mean that connectionist networks do not functionally<br />
dist<strong>in</strong>guish between different tokens. As Elman also showed, tokens <strong>of</strong> the same type are<br />
represented very similarly but with subtle differences. Notable, these differences do not<br />
only exist between tokens but also <strong>in</strong> tokens, discern<strong>in</strong>g usages <strong>of</strong> that token <strong>in</strong> slightly<br />
differ<strong>in</strong>g contexts. In this way a token is a fuzzy category on a cont<strong>in</strong>uous scale. This<br />
makes philosophically sense s<strong>in</strong>ce the boundary between type and token is <strong>of</strong>ten a matter<br />
<strong>of</strong> po<strong>in</strong>t <strong>of</strong> view, which is why also tokens can <strong>of</strong>ten be analyzed as hav<strong>in</strong>g an <strong>in</strong>ternal<br />
structure.<br />
3.2 Recursion and Complexity<br />
Recursion is considered to be a language-<strong>in</strong>dependet structural phenomenon which causes<br />
process<strong>in</strong>g difficulty. Besides the limit to recursion depth <strong>in</strong> the human language processor<br />
there are two rather surpris<strong>in</strong>g properties <strong>of</strong> human process<strong>in</strong>g ability on recursive<br />
structures. First, <strong>in</strong> a comprehensibility rat<strong>in</strong>g study Bach et al. (1986) compared the<br />
difficulty <strong>of</strong> Dutch cross-dependency versus German center-embedd<strong>in</strong>g relative to rightbranch<strong>in</strong>g<br />
and found cross-dependency to be easier on deeper embedd<strong>in</strong>g than center-<br />
49
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
embedd<strong>in</strong>g. Second, a number <strong>of</strong> studies (Bach et al., 1986; Blaubergs and Bra<strong>in</strong>e,<br />
1974; Christiansen and MacDonald, 1999) showed <strong>in</strong>creas<strong>in</strong>g difficulty for iterated rightbranch<strong>in</strong>g.<br />
These performance patterns are not predicted by grammatical complexity.<br />
Cross-dependency is seen as especially complex because it is not representable by<br />
context-free phrase structure rules. Repeated right-branch<strong>in</strong>g, on the other hand, is not<br />
predicted to affect comprehension <strong>in</strong> most symbolic models.<br />
Christiansen and Chater (1999) showed <strong>in</strong> a series <strong>of</strong> simulations that simple recurrent<br />
network’s word-by-word prediction performance on recursive structures exhibits<br />
exactly the three properties just mentioned for the human comprehender. Chistiansen<br />
and Chater assessed the performance <strong>of</strong> SRNs on different types <strong>of</strong> grammatical complexity.<br />
In particular, they used the three types <strong>of</strong> recursion proposed by Chomsky<br />
(1957): count<strong>in</strong>g recursion, mirror recursion (center-embedd<strong>in</strong>g) and identity recursion<br />
(cross-dependency). As a basel<strong>in</strong>e they also <strong>in</strong>cluded iterative right-branch<strong>in</strong>g structures.<br />
Christiansen and Chater constructed three artificial languages that each covered<br />
one <strong>of</strong> the three recursion types and right-branch<strong>in</strong>g. They were used to create tra<strong>in</strong><strong>in</strong>g<br />
corpora conta<strong>in</strong><strong>in</strong>g half right-branch<strong>in</strong>g, half recursive structures. The SRNs tra<strong>in</strong>ed<br />
on these corpora performed best on the count<strong>in</strong>g recursion dur<strong>in</strong>g test<strong>in</strong>g, followed by<br />
cross-dependency and hav<strong>in</strong>g most trouble with center-embedd<strong>in</strong>g. Similar to the human<br />
results the cross < center preference occurred only <strong>in</strong> embedd<strong>in</strong>g levels <strong>of</strong> two<br />
and upward. In s<strong>in</strong>gle embedd<strong>in</strong>g the two recursion types were equally well handled,<br />
resembl<strong>in</strong>g the human results by Bach et al. Also consistent with human data the SRNs<br />
showed decl<strong>in</strong><strong>in</strong>g performance with deeper embedd<strong>in</strong>g on right-branch<strong>in</strong>g structures.<br />
The consistency <strong>of</strong> the SRNs predictions on recursive structures even <strong>in</strong>cluded the<br />
phenomenon <strong>of</strong> the grammaticality illusion <strong>in</strong> double center-embedd<strong>in</strong>g. The SRN simulation<br />
was able to replicate the forgett<strong>in</strong>g effect described <strong>in</strong> 2.5. After tra<strong>in</strong><strong>in</strong>g on<br />
the center-embedd<strong>in</strong>g language the networks were tested on grammatical 3VP and ungrammatical<br />
2VP constructions. Mean error scores showed that the network favored the<br />
ungrammatical sequence <strong>of</strong> NNNVV over the grammatical NNNVVV sequence. Figure<br />
3.2 shows the SRN’s output node activations after see<strong>in</strong>g the second verb. The high activation<br />
<strong>of</strong> the end-<strong>of</strong>-sentence marker (EOS) clearly <strong>in</strong>dicates that the network expects<br />
the sentence here to be complete.<br />
Interest<strong>in</strong>gly, <strong>in</strong> all the simulations the number <strong>of</strong> hidden units did not affect the<br />
SRNs’ performance on recursion given a number higher than 15. Thus, <strong>in</strong> contrast to<br />
a criticism which is <strong>of</strong>ten brought forward the hidden layer size is not comparable to<br />
a certa<strong>in</strong> capacity <strong>in</strong> symbolic models, which can theoretically be <strong>in</strong>creased to capture<br />
recursion depth <strong>in</strong>to <strong>in</strong>f<strong>in</strong>ity. But what is memory capacity <strong>in</strong> a connectionist network<br />
then? MacDonald and Christiansen (2002) say the follow<strong>in</strong>g:<br />
“To the extent that it is useful to talk about work<strong>in</strong>g memory with<strong>in</strong> these<br />
systems, it is the network itself; it is not some separate entity that can vary<br />
<strong>in</strong>dependently <strong>of</strong> the architecture and experience that governs the network’s<br />
process<strong>in</strong>g efficiency.” (p. 38)<br />
50
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
Mean Activation<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
S<strong>in</strong>g<br />
Nouns<br />
Indicates 2VP Preference<br />
Indicates 3VP Preference<br />
Erroneous Activation<br />
Plur<br />
Nouns<br />
S<strong>in</strong>g<br />
Verbs<br />
Plur<br />
Verbs<br />
Lexical Categories<br />
Figure 3.2: Ungrammaticality simulation by Christiansen and Chater (1999) (their<br />
figure 10). Mean output activation <strong>of</strong> lexical categories with error bars <strong>in</strong>dicat<strong>in</strong>g<br />
standard error.<br />
In their view capacity is only a higher-level description <strong>of</strong> an SRN’s behavior. Chang<strong>in</strong>g<br />
aspects <strong>of</strong> the architecture or the tra<strong>in</strong><strong>in</strong>g always affects both memory and process<strong>in</strong>g.<br />
In contrast Just and Varma (2002) claim that noisy <strong>in</strong>put to a network would be comparable<br />
to chang<strong>in</strong>g the capacity limit <strong>in</strong> symbolic models like CC-READER. They say<br />
that <strong>in</strong> the network the representational quality would be affected while the grammat-<br />
ical knowledge would stay constant. Whatever view may be correct, one can probably<br />
say that most effect on memory capacity has the mechanism <strong>of</strong> the temporal loop and<br />
the backpropagation procedure. Us<strong>in</strong>g different learn<strong>in</strong>g algorithms like for example<br />
backpropagation through time can <strong>in</strong>crease the network’s memory span.<br />
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
3.3.1 MacDonald and Christiansen (2002)<br />
MacDonald and Christiansen (2002) (MC02) presented a connectionist model cover<strong>in</strong>g<br />
<strong>in</strong>dividual and global differences <strong>in</strong> relative clause comprehension <strong>in</strong> English. Mapp<strong>in</strong>g<br />
the word-by-word prediction performance <strong>of</strong> an Elman network on read<strong>in</strong>g times they<br />
showed an impressively accurate fit <strong>of</strong> the results <strong>of</strong> K<strong>in</strong>g and Just (1991). Show<strong>in</strong>g this,<br />
MC02 directly attacked the importance <strong>of</strong> a discrete memory component for the subject/object<br />
difference and <strong>in</strong>dividual differences <strong>in</strong> comprehension <strong>of</strong> relative clauses. S<strong>in</strong>ce<br />
MC02’s model serves as the basis for the simulations <strong>in</strong> chapter 4, I will here describe<br />
61<br />
51<br />
EOS
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
their experiment <strong>in</strong> detail. MC02 used a standard simple recurrent network (SRN) with<br />
a hidden and context layer <strong>of</strong> 60 units each. In- and output layers <strong>of</strong> 31 units each represented<br />
30 words plus an end-<strong>of</strong>-sentence (EOS) symbol. The corpora each consisted<br />
<strong>of</strong> 10,000 English sentences constructed randomly from a simple artificial probabilistic<br />
context-free grammar (PCFG). Subject- or object-modify<strong>in</strong>g relative clauses were conta<strong>in</strong>ed<br />
<strong>in</strong> 5% <strong>of</strong> the sentences. Half were subject extracted and half were object extracted<br />
RCs. The rest <strong>of</strong> each corpus consisted <strong>of</strong> simple mono-clausal sentences. Verbs differed<br />
by transitivity and shared a number agreement with their subject nouns. Each corpus<br />
consisted <strong>of</strong> about 55,000 words. The sentence length was 3 to 27 words with a mean <strong>of</strong><br />
4.5. Notably relative clauses could be embedded recursively <strong>in</strong> each noun phrase. The<br />
RC attachment probability <strong>in</strong> the PCFG (0.05) limited the embedd<strong>in</strong>g depth. MC02<br />
tra<strong>in</strong>ed 10 networks with randomly distributed <strong>in</strong>itial weights 1 , each on a different corpus.<br />
The learn<strong>in</strong>g rate was set to 0.1. The tra<strong>in</strong><strong>in</strong>g phase covered only three epochs,<br />
each consist<strong>in</strong>g <strong>of</strong> one corpus length. The networks learned to predict the next word<br />
<strong>in</strong> a sentence without be<strong>in</strong>g provided with any probabilistic <strong>in</strong>formation. The output<br />
unit activations were calculated by a cross-entropy algorithm which ensured that all<br />
activation values summed to one. In that way the networks’ output was comparable to<br />
cont<strong>in</strong>uation likelihoods assigned to each possible word. After tra<strong>in</strong><strong>in</strong>g the networks were<br />
assessed on 10 sentences <strong>of</strong> all three types (SRC, ORC, and simple clause), respectively.<br />
For <strong>in</strong>terpret<strong>in</strong>g the network output <strong>in</strong> terms <strong>of</strong> process<strong>in</strong>g difficulty MC02 calculated<br />
the so-called grammatical prediction error 2 (GPE). The GPE value is a measure<br />
for the network’s difficulty <strong>in</strong> mak<strong>in</strong>g the correct predictions on each word. The measure<br />
was then used to map the relative word-by-word differences between the conditions on<br />
read<strong>in</strong>g times from the study by K<strong>in</strong>g and Just (1991). Besides RC type MC02 used<br />
tra<strong>in</strong><strong>in</strong>g epochs as a second factor. The network performances after one, two, and three<br />
epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g were compared to low-, mid-, and high-span readers’ read<strong>in</strong>g speed.<br />
The results <strong>of</strong> MC02’s network simulation are shown <strong>in</strong> figure 3.3. Pooled over all<br />
three epochs the results show a clear subject preference on the ma<strong>in</strong> verb (praised) and<br />
the preced<strong>in</strong>g region (embedded object <strong>in</strong> the SRC and embedded verb <strong>in</strong> the ORC).<br />
Furthermore the ORC performance shows significant improvement on the embedded and<br />
ma<strong>in</strong> verb through the three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g. Notably, the SRC data does not show<br />
such an improvement. Rather the performance was relatively good from the start with<br />
no change dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. This <strong>in</strong>dicates a clause type × exposure <strong>in</strong>teraction. The<br />
same <strong>in</strong>teraction (<strong>in</strong> this case clause type × read<strong>in</strong>g span) is seen <strong>in</strong> K<strong>in</strong>g and Just’s<br />
empirical data (figure 2.1). Notably, the simple SRN model seems to make better predictions<br />
than the CC-READER model by Just and Carpenter (1992) s<strong>in</strong>ce CC-READER<br />
captures the span effect but not the <strong>in</strong>teraction with clause type (see figure 2.4). Importantly,<br />
MC02 call the mentioned <strong>in</strong>teraction a F requency × Regularity <strong>in</strong>teraction.<br />
Specifically, the regular nature <strong>of</strong> English SRCs with respect to word order (SVO) serves<br />
1 Between -0.15 and 0.15.<br />
2 See chapter 4 for a detailed description<br />
52
42<br />
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
MACDONALD AND CHRISTIANSEN<br />
Figure 2. Network performance on sentence <strong>in</strong>volv<strong>in</strong>g s<strong>in</strong>gly embedded subject- and object-relative clauses.<br />
Grammatical prediction error scores were averaged over 10 novel sentences <strong>of</strong> each k<strong>in</strong>d and grouped <strong>in</strong>to four<br />
regions to facilitate comparisons with the human data (see Just & Carpenter, 1992, Figure 9, p. 140). Error bars<br />
represent standard error.<br />
Figure 3.3: F requency × Regularity <strong>in</strong>teraction <strong>of</strong> SRCs and ORCs. Performance <strong>of</strong><br />
the model <strong>in</strong> MacDonald and Christiansen (2002). Error bars show standard error.<br />
sponds to that <strong>of</strong> high-span participants, the pattern <strong>of</strong> performance<br />
resembles that found <strong>in</strong> the human data discussed by Just and<br />
Carpenter, <strong>in</strong>clud<strong>in</strong>g at the ma<strong>in</strong> verb region where K<strong>in</strong>g and Just<br />
reported their three critical effects. 3 Carpenter’s and Waters and Caplan’s is that we have two claims<br />
about what capacity is and is not. First, capacity is not some<br />
as a sort <strong>of</strong> familiarity boost that makes primitive, them<strong>in</strong>dependent easier toproperty handle. <strong>of</strong> networks As described or humans<strong>in</strong><strong>in</strong> our<br />
section 3.1 the representational First, averag<strong>in</strong>gsimilarity across sen- (and account with but it is <strong>in</strong>stead the related strictly emergent prediction from behavior)<br />
other architectural<br />
tence type, <strong>of</strong>the words networks is bound with moreto tra<strong>in</strong><strong>in</strong>g theirhad occurrence lower error rates behavior and experiential <strong>in</strong> context. factors. S<strong>in</strong>ce Second, thecapacity representation is not <strong>in</strong>dependent <strong>of</strong> <strong>of</strong><br />
than less-tra<strong>in</strong>ed networks at the critical ma<strong>in</strong> verb region, analo- knowledge, so that one cannot manipulate factors underly<strong>in</strong>g the<br />
each <strong>in</strong>put word also <strong>in</strong>cludes traces <strong>of</strong> each preced<strong>in</strong>g word (i.e. the currently preced<strong>in</strong>g<br />
gous to the ma<strong>in</strong> effect <strong>of</strong> read<strong>in</strong>g span that K<strong>in</strong>g and Just (1991) capacity <strong>of</strong> a network (e.g., hidden unit layer size, activation<br />
reported. structure) Second, averag<strong>in</strong>g similar across structures tra<strong>in</strong><strong>in</strong>g epoch, result the SRNs <strong>in</strong> similar had function, <strong>in</strong>ternal weight representations decay, connectivitywhich pattern, aga<strong>in</strong> tra<strong>in</strong><strong>in</strong>g) result without also<br />
higher error <strong>in</strong> similar rates withprediction object than subject behavior. relativesConsequently at the ma<strong>in</strong> affect<strong>in</strong>g the relatively the knowledge quickly embedded ga<strong>in</strong>ed <strong>in</strong> that“knowledge” network. These two<br />
verb, yield<strong>in</strong>g a ma<strong>in</strong> effect <strong>of</strong> relative clause type. Third, there claims are not merely term<strong>in</strong>ological changes but rather make our<br />
about simple sentences (95% <strong>of</strong> the corpus) also <strong>in</strong>fluences the “knowledge” about the<br />
was little effect <strong>of</strong> tra<strong>in</strong><strong>in</strong>g for the “regular” subject relatives and account qualitatively different from the work<strong>in</strong>g memory accounts<br />
a large effect structurally <strong>of</strong> tra<strong>in</strong><strong>in</strong>g for similar the “irregular” SRCs. object relatives at the advocated by Just and Carpenter and Waters and Caplan, for which<br />
ma<strong>in</strong> verb. This result captures the Span Clause Type <strong>in</strong>teraction one or more capacities can vary <strong>in</strong>dependently <strong>of</strong> knowledge,<br />
discussed by Just and Carpenter and stands <strong>in</strong> contrast to their own experience, and other factors. In their view, capacity is someth<strong>in</strong>g<br />
simulation results. Just and Carpenter’s CC-READER simulations, that enables a certa<strong>in</strong> level <strong>of</strong> process<strong>in</strong>g ability or skill, whereas<br />
which are shown <strong>in</strong> the top panels <strong>of</strong> their Figure 9, appear to have for us, capacity is a synonym for that skill. Just and Carpenter’s<br />
yielded ma<strong>in</strong> effects <strong>of</strong> span and sentence type but not the crucial and Waters and Caplan’s <strong>in</strong>termediate step is superfluous <strong>in</strong> our<br />
<strong>in</strong>teraction <strong>in</strong> that the effect <strong>of</strong> span appears no larger on object account.<br />
than subject relatives at the ma<strong>in</strong> verb or at any other region.<br />
Our results highlight the potentially complex role <strong>of</strong> experience<br />
Multiple Versus S<strong>in</strong>gle Interpretations<br />
<strong>in</strong> <strong>in</strong>dividual differences. Both types <strong>of</strong> relative clauses were<br />
<strong>of</strong> Syntactic Ambiguities<br />
encountered equally frequently <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g corpora, but the<br />
superior performance on the subject relatives stems from the Just and Carpenter (1992, pp. 130–132; see also the Waters &<br />
networks’ abilities to generalize to rare structures as a function <strong>of</strong> Caplan, 1996, reply on pp. 765–766) presented evidence for a<br />
experience with similar, more common simple sentences. The work<strong>in</strong>g memory capacity <strong>based</strong> on MacDonald, Just, & Carpenextent<br />
and nature <strong>of</strong> this Frequency Regularity <strong>in</strong>teraction<br />
changed as a function <strong>of</strong> the overall experience <strong>of</strong> the network <strong>in</strong><br />
3<br />
that additional experience helped performance with object rela- The one region where the SRN does not correspond well to K<strong>in</strong>g and<br />
tives more than with subject relatives.<br />
Just’s human data is at the last word <strong>of</strong> the subord<strong>in</strong>ate clause <strong>in</strong> the<br />
At this po<strong>in</strong>t it is important to consider an alternative view, that<br />
subject-relative sentences, where the SRN showed less process<strong>in</strong>g difficulty<br />
than K<strong>in</strong>g and Just’s participants. This discrepancy may be due to<br />
our model is simply a connectionist implementation <strong>of</strong> a capacity-<br />
variations <strong>in</strong> the materials, particularly to the length <strong>of</strong> the subject-relative<br />
<strong>based</strong> account <strong>of</strong> <strong>in</strong>dividual differences. We are not claim<strong>in</strong>g, <strong>of</strong><br />
clause, which varied <strong>in</strong> K<strong>in</strong>g and Just’s items but was more uniform <strong>in</strong> our<br />
course, that there is no such th<strong>in</strong>g as capacity; clearly any network materials. Gibson and Ko (1998) also used uniformly short subject relatives<br />
(and any human) can be described as hav<strong>in</strong>g a particular capacity and found little process<strong>in</strong>g difficulty at this position <strong>in</strong> self-paced read<strong>in</strong>g<br />
to process <strong>in</strong>formation, and <strong>in</strong>dividual networks and people can studies, and our simulations correspond quite well to their read<strong>in</strong>g data <strong>in</strong><br />
vary <strong>in</strong> their capacities. What sets our account apart from Just and this and other regions.<br />
3 Theoretically one could <strong>in</strong>vent a structure which is similar<br />
to SRCs but which the network has not been exposed to <strong>in</strong> tra<strong>in</strong><strong>in</strong>g. The SRN would<br />
be able to handle it, nonetheless, benefit<strong>in</strong>g from previous experience. The reason for<br />
the absence <strong>of</strong> the regularity benefit <strong>in</strong> ORC is its deviation from all other structures<br />
<strong>in</strong> the corpus. While <strong>in</strong> the SRC reporter, attacked, and senator appear <strong>in</strong> the regular<br />
NVN order<strong>in</strong>g, the ORC exhibits an NNV order<strong>in</strong>g. Specifically the number agreement<br />
serves as a structural h<strong>in</strong>t <strong>in</strong> differentiat<strong>in</strong>g the word orders and to avoid that wrong<br />
regularities are learned. The network ga<strong>in</strong>s implicit <strong>in</strong>formation about subject-object<br />
relations from the <strong>in</strong>put data through the structural fact that verbs share a number<br />
agreement with subjects but not with objects.<br />
The simulations show that a purely exposure-<strong>based</strong> model is able to predict complex<br />
<strong>in</strong>teractions <strong>of</strong> complexity-related read<strong>in</strong>g difficulties and <strong>in</strong>dividual differences on<br />
word level. The evidence lead MC02 to the formulation <strong>of</strong> their Skill-through-<br />
3I put knowledge <strong>in</strong> quotes here because what is meant <strong>in</strong> a connectionist system is not declarative or<br />
explicit knowledge but rather implicit behavioral knowledge provided by the connection weights.<br />
53
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
<strong>Experience</strong> Account (p. 44) that attacks the modular picture <strong>of</strong> knowledge and<br />
memory. The crucial claim <strong>of</strong> MC02 is that differences <strong>in</strong> performance result from process<strong>in</strong>g<br />
skill as a function <strong>of</strong> experience and not a separable WM capacity.<br />
“In our view, neither knowledge nor capacity are primitives that can vary<br />
<strong>in</strong>dependently <strong>in</strong> theory or computational models; rather they emerge from<br />
the <strong>in</strong>teraction <strong>of</strong> network architecture and experience.” (p. 37)<br />
The subsymbolic (and behavioristic) nature <strong>of</strong> connectionist networks make grammatical<br />
knowledge and process<strong>in</strong>g <strong>in</strong>dist<strong>in</strong>guishable. A change <strong>in</strong> parameters like weight<br />
vectors or hidden layer size is not attributable to one <strong>of</strong> the two components. Rather<br />
either affects the behavior <strong>of</strong> the whole network.<br />
3.3.2 Critique and Relation to other Approaches<br />
MC02 see their model as an opposition ma<strong>in</strong>ly to models like Just and Carpenter (1992)<br />
and Waters and Caplan (1996), which explicitly account for memory capacity limitations.<br />
MC02’s SRN simulations have important implications with respect to biological plausibility<br />
<strong>of</strong> process<strong>in</strong>g models. they demonstrated that there is no need to assume separable<br />
work<strong>in</strong>g memory and knowledge modules <strong>in</strong> order to account for effects attributed to<br />
these. Rather experience shapes the whole system and capacity is a property emergent<br />
from the systems architecture. That emphasizes the role <strong>of</strong> symbolic models like Just<br />
and Carpenter (1992) as merely higher-level descriptions <strong>of</strong> underly<strong>in</strong>g processes. There<br />
is, <strong>of</strong> course, noth<strong>in</strong>g wrong with symbolic descriptions. What is <strong>in</strong> question, however,<br />
is the justification <strong>of</strong> explicit numerical limits on capacity. In Just and Carpenters account<br />
the capacity limit is def<strong>in</strong>ed as the maximal amount <strong>of</strong> activation attributed to<br />
productions (process<strong>in</strong>g rules). The latter can be varied without touch<strong>in</strong>g the rest <strong>of</strong> the<br />
system. Argu<strong>in</strong>g with MC02, however, this value is <strong>in</strong>deed emergent and <strong>in</strong>separable<br />
from the entire system. As a consequence sentence comprehension and read<strong>in</strong>g span<br />
measure the same th<strong>in</strong>g, namely read<strong>in</strong>g skill, which is the experience-shaped efficiency<br />
<strong>of</strong> l<strong>in</strong>guistic processes.<br />
Non-L<strong>in</strong>guistic Work<strong>in</strong>g Memory<br />
Not conv<strong>in</strong>ced by this view Roberts and Gibson (2002) note that a pure skill-viaexperience<br />
account would not be able to expla<strong>in</strong> these correlations <strong>of</strong> comprehension skill<br />
with non-l<strong>in</strong>guistic work<strong>in</strong>g memory tasks that do not <strong>in</strong>volve sentence read<strong>in</strong>g. Roberts<br />
and Gibson provide respective empirical evidence for correlations <strong>of</strong> sentence memory<br />
with several memory load tasks that do not <strong>in</strong>volve read<strong>in</strong>g sentences. Address<strong>in</strong>g these<br />
correlations MC02 propose that read<strong>in</strong>g skill is tied to phonological representations.<br />
These representations play the crucial role <strong>in</strong> all sorts <strong>of</strong> memory load tasks and account<br />
for <strong>in</strong>dividual differences. Regard<strong>in</strong>g phonological representations MC02 formulate four<br />
important claims (p.45):<br />
54
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
(a) “Phonological and articulatory representations must be activated <strong>in</strong> order to utter<br />
the words for the load task”.<br />
(b) “Phonological activation is an important component <strong>of</strong> written and spoken sentence<br />
comprehension, particularly for certa<strong>in</strong> difficult sentence structures”.<br />
(c) “The extend to which phonological representations are important dur<strong>in</strong>g comprehension<br />
<strong>of</strong> difficult syntactic structures is likely to vary <strong>in</strong>versely with experience,<br />
such that phonological <strong>in</strong>formation is more crucial for less experienced comprehenders”.<br />
(d) “There appear to be notable <strong>in</strong>dividual differences <strong>in</strong> the ‘precision’ <strong>of</strong> phonological<br />
representations computed dur<strong>in</strong>g language comprehension, and these differences<br />
are thought to owe both to read<strong>in</strong>g experience and to biological factors.”<br />
As becomes clear MC02 do not completely deny an <strong>in</strong>fluence <strong>of</strong> biological factors on<br />
process<strong>in</strong>g skill. These factors, however, concern the precision <strong>of</strong> representations, not<br />
capacity limitations, and those are subject to experience-caused variance. Moreover are<br />
<strong>in</strong>dividual differences assumed to be allocated primarily <strong>in</strong> the dependence on these<br />
phonological representations, mean<strong>in</strong>g that highly-skilled readers exhibit a more efficient<br />
process<strong>in</strong>g that does not rely so much on the phonological <strong>in</strong>formation. For example <strong>in</strong><br />
extr<strong>in</strong>sic load tasks 4 both the stored items and sentence comprehension processes make<br />
use <strong>of</strong> shared phonological representations. Thus MC02 expla<strong>in</strong> load effects by activation<br />
<strong>in</strong>terference rather than activation limits. This is seen as naturally evolv<strong>in</strong>g from<br />
evidence that articulatory plann<strong>in</strong>g <strong>in</strong>volves strict activation and <strong>in</strong>hibition <strong>of</strong> phonological<br />
units (Bock, 1987; Dell and O’Seaghdha, 1992). Thus dur<strong>in</strong>g extr<strong>in</strong>sic load tasks<br />
activation and <strong>in</strong>hibition processes from both load and comprehension mechanism work<br />
on the same representations, <strong>in</strong>terfer<strong>in</strong>g with each other. The more effective process<strong>in</strong>g<br />
<strong>of</strong> highly experienced readers makes less use <strong>of</strong> the representation and, thus, reduces<br />
difficulties due to <strong>in</strong>terference. The same processes also happen <strong>in</strong> the read<strong>in</strong>g span task<br />
(which, by the way, is basically the same task as extr<strong>in</strong>sic load). The conclusion is that<br />
read<strong>in</strong>g span is a function <strong>of</strong> experience and not <strong>of</strong> memory capacity. This account is<br />
also superior to Waters and Caplan (1996) because their theory assumes two separate<br />
work<strong>in</strong>g memories and, hence, does not predict an <strong>in</strong>teraction <strong>of</strong> comprehension and<br />
extr<strong>in</strong>sic load. Furthermore also RC type differences are expla<strong>in</strong>ed <strong>in</strong> the same way,<br />
namely that “object relatives, which are more challeng<strong>in</strong>g than subject relatives, are<br />
likely to rely more on phonological <strong>in</strong>formation than subject relatives” (p. 45).<br />
4 In these tasks participants are asked to memorize a set <strong>of</strong> words or digits and reta<strong>in</strong> it while read<strong>in</strong>g<br />
sentences. The extr<strong>in</strong>sic load <strong>in</strong>fluences the sentence comprehension performance <strong>in</strong> a certa<strong>in</strong> way<br />
that correlates with the participants read<strong>in</strong>g span value (Just and Carpenter, 1992; K<strong>in</strong>g and Just,<br />
1991).<br />
55
German Word Order<br />
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
Konieczny and Ruh (2003) ran simulations on German relative clauses us<strong>in</strong>g the model<br />
parameters <strong>of</strong> MC02. The results are <strong>in</strong>consistent with the empirical subject preference.<br />
German ORCs clearly exhibit lower error rates on the embedded verb. In addition the<br />
results do not show a frequency × regularity <strong>in</strong>teraction. This is not so surpris<strong>in</strong>g<br />
consider<strong>in</strong>g german word order properties. In English the regularity effect is attributed<br />
to the shared SVO order<strong>in</strong>g <strong>of</strong> ma<strong>in</strong> clauses and SRCs that separate both structures from<br />
the SOV ordered ORCs. On the other hand, <strong>in</strong> German while ma<strong>in</strong> clauses commonly<br />
exhibit an SVO order<strong>in</strong>g as <strong>in</strong> English the order <strong>in</strong> SRCs and ORCs is SOV and OSV<br />
respectively. In addition the free word order <strong>in</strong> German also allows an OVS structure<br />
<strong>in</strong> ma<strong>in</strong> clauses. This makes four different possible word orders <strong>in</strong> German that are not<br />
expected to give rise to a regularity preferr<strong>in</strong>g one <strong>of</strong> the two RC types. As po<strong>in</strong>ted out<br />
<strong>in</strong> 1.3.3 a canonicity account <strong>based</strong> on thematic order<strong>in</strong>gs cannot make clear predictions<br />
assum<strong>in</strong>g SVO as the canonic order<strong>in</strong>g. On the other hand, an SOV-canonicity account<br />
would make the correct predictions. However, to derive structural frequency-<strong>based</strong> SOV<br />
regularity the structural scope <strong>of</strong> the model would have to be extended. Simple ma<strong>in</strong><br />
clauses do not provide the desired regular structures as they do <strong>in</strong> English. I suspect<br />
that <strong>in</strong>clud<strong>in</strong>g sentential complements and other subord<strong>in</strong>ate clauses exhibit<strong>in</strong>g an SOV<br />
pattern would result <strong>in</strong> the desired frequency × regularity <strong>in</strong>teraction with an advantage<br />
for SRCs. The exact reason for the actual advantage <strong>of</strong> ORCs over SRCs <strong>in</strong> the<br />
current model will be discusses at the end <strong>of</strong> the next section, which is concerned <strong>in</strong><br />
more detail with the SRN’s structure-<strong>based</strong> learn<strong>in</strong>g <strong>in</strong> RC process<strong>in</strong>g.<br />
3.3.3 What is learned?<br />
As laid out <strong>in</strong> the previous section the model by MacDonald and Christiansen (2002)<br />
was criticized <strong>in</strong> many issues. It is not completely evident how much <strong>of</strong> the networks<br />
prediction can be attributed to a frequency × regularity effect and what are merely<br />
artifacts. A mere correlation between network experience and human read<strong>in</strong>g span is<br />
no sufficient evidence for an experience effect for human readers. Similarly, the effect <strong>of</strong><br />
structural regularity differences between SRCs and ORCs does not necessarily cause the<br />
preference pattern for human readers. Also, assum<strong>in</strong>g the conclusions drawn from the<br />
mentioned correlations are correct, the question rema<strong>in</strong>s <strong>of</strong> what exactly the learn<strong>in</strong>g<br />
effect <strong>in</strong> relative clauses is <strong>based</strong> upon. An exposure-<strong>based</strong> theory driven by structural<br />
frequency is <strong>in</strong> the need to say someth<strong>in</strong>g about the specific structural cues essential for<br />
shap<strong>in</strong>g the efficiency <strong>of</strong> RC process<strong>in</strong>g.<br />
Assess<strong>in</strong>g <strong>Experience</strong><br />
Wells et al. (2009) designed an experiment that took the challenge <strong>of</strong> replicat<strong>in</strong>g the<br />
effects <strong>of</strong> experience with human readers for hav<strong>in</strong>g a basis to assess the accuracy <strong>of</strong><br />
56
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
MC02’s SRN predictions. For that reason two participant groups were formed: an RC<br />
experience group and a control experience group, both matched on read<strong>in</strong>g span. Both<br />
groups received read<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g over 3-4 weeks. The RC experience group received<br />
tra<strong>in</strong><strong>in</strong>g ma<strong>in</strong>ly on SRC and ORC structures whereas the control experience group was<br />
exposed to other structures. A pre- and a post-test carried out <strong>in</strong> the SPR paradigm<br />
assessed both groups RC read<strong>in</strong>g performance. The results show an overall improvement<br />
<strong>of</strong> RC read<strong>in</strong>g. Most importantly, however, the data revealed an <strong>in</strong>teraction <strong>of</strong><br />
session × experience group × clause type × region as can be seen <strong>in</strong> figure 3.4.<br />
There was a reliable difference between SRC and ORC read<strong>in</strong>g times observed <strong>in</strong> both<br />
groups <strong>in</strong> the pre-test; <strong>in</strong> particular the ORC condition was read slower. The subject/object<br />
difference, however, decreased significantly between pre- and post-test for the<br />
RC experience group whereas <strong>in</strong> the control group it stayed the same. Wells et al. attribute<br />
the global read<strong>in</strong>g improvement to <strong>in</strong>creased familiarity with the SPR task. The<br />
observed pattern <strong>in</strong> the experience group is similar, first, to the span × clause type<br />
<strong>in</strong>teraction <strong>in</strong> the study by K<strong>in</strong>g and Just (1991) reported <strong>in</strong> 2.1, and second, it resembles<br />
the frequency × regularity <strong>in</strong>teraction <strong>of</strong> the connectionist model <strong>in</strong> MC02 (see<br />
figure 3.5). Statistically the SRNs’ mean GPE scores predicted the with<strong>in</strong>-sentence and<br />
experience-<strong>based</strong> variance <strong>in</strong> the human data extraord<strong>in</strong>arily well. There was a total fit<br />
<strong>of</strong> GPE and read<strong>in</strong>g times <strong>of</strong> R 2 = .75. In a hierarchical regression Wells and colleagues<br />
also predicted the SRN simulation results us<strong>in</strong>g the human read<strong>in</strong>g times. The RC experience<br />
group data accounted for 75% <strong>of</strong> overall variance <strong>in</strong> the SRNs’ GPE scores.<br />
Us<strong>in</strong>g the control group data as a predictor, which did not <strong>in</strong>volve an experience factor<br />
but merely with<strong>in</strong>-sentence variance, the regression accounted only for about 65% <strong>of</strong> the<br />
GPE variance.<br />
The impressive result <strong>of</strong> Wells et al.’s study delivers empirical support for the implications<br />
drawn by MC02 from the SRN simulations, namely that experience can account<br />
for <strong>in</strong>dividual differences <strong>in</strong> read<strong>in</strong>g skill. Notably, the Wells et al. study proved a significant<br />
experience effect after m<strong>in</strong>imal exposure amount (the tra<strong>in</strong><strong>in</strong>g sets conta<strong>in</strong>ed only<br />
160 sentences <strong>in</strong> total). Furthermore, lexical and structural short term prim<strong>in</strong>g effects<br />
were excluded by, first, a four weeks test distance, second, no lexical overlap between the<br />
tests and, third, the usage <strong>of</strong> RC constructions dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g that were structurally<br />
different from test<strong>in</strong>g items. Also task-related adaptation can be excluded s<strong>in</strong>ce dur<strong>in</strong>g<br />
the tra<strong>in</strong><strong>in</strong>g phase a full-sentence display was used <strong>in</strong>stead <strong>of</strong> word-by-word read<strong>in</strong>g.<br />
Now, given the evidence that read<strong>in</strong>g skill on certa<strong>in</strong> sentence structures is affected<br />
by previous experience with these structures it rema<strong>in</strong>s the question <strong>of</strong> what has been<br />
learned; a question that evolves from the granularity problem <strong>of</strong> exposure-<strong>based</strong> theories<br />
described <strong>in</strong> section 1.3.4. Wells and colleagues identify verb transitivity as a crucial<br />
factor driv<strong>in</strong>g the learn<strong>in</strong>g process <strong>of</strong> the SRN. In ORCs on the one hand, embedded<br />
verbs are necessarily transitive because the head noun has to fill an object role. On<br />
the other hand, simple sentences and SRCs occur with transitive and <strong>in</strong>transitive verbs.<br />
That has the consequence that only for cont<strong>in</strong>uation predictions <strong>of</strong> ORCs the network has<br />
to learn to differentiate verb transitivity <strong>in</strong> the predictions whereas for the cont<strong>in</strong>uation<br />
57
Length-Adjusted Read<strong>in</strong>g Time (ms)<br />
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
Figure 1: Self-Paced Read<strong>in</strong>g Patterns at Pre- and Posttest<br />
175<br />
150<br />
125<br />
100<br />
75<br />
50<br />
25<br />
0<br />
-25<br />
-50<br />
-75<br />
1 2 3 4<br />
OR: (The) clerk that the typist tra<strong>in</strong>ed told the truth<br />
SR: (The) clerk that tra<strong>in</strong>ed the typist told the truth<br />
36<br />
Relative Clause <strong>Experience</strong><br />
RC <strong>Experience</strong> Group (n=32) Control <strong>Experience</strong> Group (n=32)<br />
Pretest, Object Relatives<br />
Pretest, Subject Relatives<br />
Posttest, Object Relatives<br />
Posttest, Subject Relatives<br />
175<br />
150<br />
125<br />
100<br />
75<br />
50<br />
25<br />
0<br />
-25<br />
-50<br />
-75<br />
1 2 3 4<br />
(The) clerk that the typist tra<strong>in</strong>ed told the truth<br />
(The) clerk that tra<strong>in</strong>ed the typist told the truth<br />
Figure 3.4: Wells et al. (2009) read<strong>in</strong>g times for pre- and post-test by group and RC<br />
type.<br />
<strong>of</strong> other structures transitive and <strong>in</strong>transitive verbs are <strong>of</strong> equal probability. Example<br />
(20) shows structural prefixes <strong>in</strong> ma<strong>in</strong> clauses (20a), SRCs (20b), and ORCs (20c) with<br />
possible transitivity properties <strong>of</strong> predicted verbs.<br />
(20) a. Simple: EOS the N . . . {trans/<strong>in</strong>trans}<br />
b. SRC: the N that . . . {trans/<strong>in</strong>trans}<br />
c. ORC: (the N) that the N . . . {trans}<br />
The experience effect <strong>of</strong> human readers, however, can be affected by a lot more structural<br />
cues <strong>in</strong> the <strong>in</strong>put. A crucial factor also mentioned for Ch<strong>in</strong>ese RCs <strong>in</strong> section 2.4<br />
is animacy. In English like <strong>in</strong> Mandar<strong>in</strong> ORCs mostly conta<strong>in</strong> <strong>in</strong>animate head nouns<br />
whereas <strong>in</strong> SRCs head nouns are commonly animate. S<strong>in</strong>ce Wells et al. only used<br />
animate head nouns for both SRCs and ORCs the participants might have learned to<br />
handle the non-canonical animate-headed ORCs. Race and MacDonald (2003), Reali<br />
and Christiansen (2007a), and Reali and Christiansen (2007b) identify further probabilistic<br />
constra<strong>in</strong>ts that correlate with SRC/ORC corpus distributions. For example<br />
pronom<strong>in</strong>al ORCs mostly conta<strong>in</strong> personal pronouns whereas impersonal pronouns occur<br />
more frequently <strong>in</strong> SRCs. Furthermore there are differences <strong>in</strong> the NP-type <strong>of</strong> the<br />
embedded subjects that separate ORCs from other structures that exhibit an ‘NP that<br />
NPSubj VP’ sequence. The usage <strong>of</strong> a pronoun <strong>in</strong> the NPSubj position is highly correlated<br />
with ORCs. The Wells et al. study only used common nouns and all RCs were<br />
headed by the impersonal pronoun that, which both are potential properties subject to<br />
probabilistic learn<strong>in</strong>g s<strong>in</strong>ce they are deviations from natural frequency patterns.<br />
58
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
Relative Clause <strong>Experience</strong><br />
Figure 2. Comparison <strong>of</strong> the Human Self-Paced Read<strong>in</strong>g Patterns at Pre- and Post-test to the GPE Patterns Obta<strong>in</strong>ed for the SRNs.<br />
Figure 3.5: Wells et al. (2009) read<strong>in</strong>g times compared to model<strong>in</strong>g results <strong>of</strong> MacDonald<br />
and Christiansen (2002)<br />
The unnatural ORC patterns are probably the reason for an experience effect over<br />
such a short <strong>in</strong>terval <strong>of</strong> only 4 weeks. 37 Additionally, unnaturalness might have <strong>in</strong>flated<br />
the clause type effect <strong>in</strong> read<strong>in</strong>g times. I suspect that a study with completely natural<br />
sentences would show a smaller clause type difference and would need a strongly extended<br />
tra<strong>in</strong><strong>in</strong>g period to show experience effects. However, this small-scale study clearly proves<br />
the effect <strong>of</strong> structural probabilistic constra<strong>in</strong>ts on read<strong>in</strong>g skill. The comparability with<br />
the SRN simulations is given by the transitivity constra<strong>in</strong>t <strong>in</strong> ORCs which is a probable<br />
factor both <strong>in</strong> the simulations and <strong>in</strong> the human study.<br />
A Detailed Prediction Analysis<br />
The assumption that the SRN ma<strong>in</strong>ly learns the verb transitivity <strong>in</strong> ORCs is, however,<br />
only speculation. Konieczny and Ruh (2003) carried out a more detailed analysis <strong>of</strong><br />
the word-by-word predictions <strong>of</strong> MC02’s SRN. They found that the high GPE scores on<br />
the embedded verb <strong>in</strong> the ORC and the matrix verb <strong>in</strong> both RC types were caused by<br />
lexical misclassifications and local coherence. In particular, see<strong>in</strong>g the ORC sequence<br />
‘the N that the N’ the SRN predicts the end-<strong>of</strong>-sentence marker EOS <strong>in</strong> early tra<strong>in</strong><strong>in</strong>g.<br />
In later epochs the activation on EOS is reduced and <strong>in</strong>creased on the correct verbs.<br />
Notably, the predictions for <strong>in</strong>correct verbs (<strong>in</strong> this case all <strong>in</strong>transitive verb plus those<br />
with non-match<strong>in</strong>g number) do not change over tra<strong>in</strong><strong>in</strong>g. This pattern is <strong>in</strong>consistent<br />
with the transitivity hypothesis. The experience-<strong>based</strong> decrease <strong>of</strong> error on the ORC<br />
embedded verb is seem<strong>in</strong>gly not caused by learn<strong>in</strong>g the dist<strong>in</strong>ction between transitive<br />
and <strong>in</strong>transitive verbs but rather by learn<strong>in</strong>g to separate the pronoun that from verbs.<br />
The other region show<strong>in</strong>g effects <strong>of</strong> clause type and experience is the ma<strong>in</strong> verb. At<br />
59
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
this po<strong>in</strong>t the SRN highly predicts a determ<strong>in</strong>er after see<strong>in</strong>g an ORC. This po<strong>in</strong>ts to a<br />
locally consistent <strong>in</strong>terpretation <strong>of</strong> the embedded ‘. . . the N Vtrans’ sequence as a ma<strong>in</strong><br />
clause prefix cont<strong>in</strong>u<strong>in</strong>g with an NP. This mis<strong>in</strong>terpretation is reduced <strong>in</strong> later tra<strong>in</strong><strong>in</strong>g<br />
epochs. An additional and very stable false prediction on the ma<strong>in</strong> verb is the EOS<br />
after embedded SRCs and ORCs. Concern<strong>in</strong>g the SRC this is consistent with a locally<br />
coherent <strong>in</strong>terpretation <strong>of</strong> the SRC sequence ‘. . . Vtrans the N’ as part <strong>of</strong> a ma<strong>in</strong> clause.<br />
In the ORC on the other hand, the EOS prediction after the ‘. . . the N Vtrans’ sequence<br />
is only locally consistent when <strong>in</strong>terpret<strong>in</strong>g the transitive verb as <strong>in</strong>transitive. This<br />
seems on first sight to be consistent with the assumption <strong>of</strong> Wells et al. (2009) that<br />
the SRN has to learn the trans/<strong>in</strong>trans difference. But surpris<strong>in</strong>gly the wrong EOS<br />
prediction <strong>in</strong>creases with further tra<strong>in</strong><strong>in</strong>g, <strong>in</strong>dicat<strong>in</strong>g that the network does not recognize<br />
the transitivity <strong>of</strong> the embedded verb. Summariz<strong>in</strong>g the analysis <strong>of</strong> Konieczny and Ruh,<br />
what causes the effects on embedded and ma<strong>in</strong> verb is a) the <strong>in</strong>terpretation <strong>of</strong> that as<br />
a verb, b) the prediction <strong>of</strong> the sentence to end after an embedded RC due to local<br />
coherence, and c) the failure to classify transitive and <strong>in</strong>transitive verbs. Konieczny and<br />
Ruh suggest to abandon verbs that can be both transitive and <strong>in</strong>transitive from the<br />
lexicon to separate the two classes more clearly. Furthermore the grammar should allow<br />
the use <strong>of</strong> pronom<strong>in</strong>al NPs to move the classification <strong>of</strong> that nearer to nouns than verbs.<br />
Concern<strong>in</strong>g the German RC simulations the explanation <strong>of</strong> the effects is quite simple.<br />
German SRCs and ORCs differ only <strong>in</strong> the serial order <strong>of</strong> the the relative pronoun and the<br />
determ<strong>in</strong>er <strong>of</strong> the embedded NP. Consequently the SRC conta<strong>in</strong>s <strong>in</strong> this region a NOM-<br />
ACC sequence whereas the SRC conta<strong>in</strong>s an ACC-NOM sequence. The embedded verb<br />
always agrees with the nom<strong>in</strong>ative (der). This produces a locally consistent structure <strong>of</strong><br />
‘detnom N V’ <strong>in</strong> the ORC but not <strong>in</strong> the SRC. Follow<strong>in</strong>g Konieczny and Ruh this local<br />
consistency effect produces the correct predictions for the embedded verb <strong>in</strong> the ORC,<br />
which is the reason for the lower error. In the SRC the verb is bound to the relative<br />
pronoun, which shares the number with the matrix subject. The SRN’s verb predictions,<br />
however, seem to be more <strong>in</strong>fluenced by the number <strong>of</strong> the <strong>in</strong>terven<strong>in</strong>g object than by<br />
the distant dependency.<br />
3.3.4 Summary<br />
In us<strong>in</strong>g SRNs MacDonald and Christiansen (2002) take advantage <strong>of</strong> a simple mechanism<br />
that, however, without architectural predesign makes excellent predictions concern<strong>in</strong>g<br />
the functional relation between exposure to certa<strong>in</strong> structures and process<strong>in</strong>g skill.<br />
The model’s behavior is <strong>in</strong>terpretable <strong>in</strong> terms <strong>of</strong> memory and decay, but due to its<br />
temporal loop and learn<strong>in</strong>g mechanism it sensitive to context and experience. The K<strong>in</strong>g<br />
and Just data was well fitted, especially for <strong>in</strong>dividual differences. The model results<br />
<strong>in</strong> comb<strong>in</strong>ation with the study by Wells et al. (2009) provide a comprehensive skillthrough-experience<br />
account that <strong>in</strong>cludes <strong>in</strong>dividual and language-specific differences.<br />
Konieczny and Ruh (2003) and others question the model’s validity, partly because a<br />
detailed analysis shows that learned constra<strong>in</strong>ts are <strong>of</strong> local nature and not comparable<br />
60
3.3 A Model <strong>of</strong> RC Process<strong>in</strong>g<br />
to human learn<strong>in</strong>g. However, Christiansen and Chater (1999) showed that an SRN is<br />
an outstand<strong>in</strong>g predictor, especially for complex embedd<strong>in</strong>g issues. Chapter 4 will now<br />
show useful SRN predictions on two further topics.<br />
61
Chapter 4<br />
Two SRN Prediction Studies<br />
This chapter is concerned with the connectionist simulation <strong>of</strong> the subject/object difference<br />
<strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect <strong>in</strong> English and German. As has been found<br />
<strong>in</strong> the previous chapters, detailed predictions <strong>of</strong> an experience-<strong>based</strong> account are important<br />
both the subject/object difference <strong>in</strong> Ch<strong>in</strong>ese and the language-specific forgett<strong>in</strong>g<br />
effect. For the forgett<strong>in</strong>g effect a structural experience account seems to be a promis<strong>in</strong>g<br />
predictor captur<strong>in</strong>g the divergence <strong>of</strong> the effects <strong>in</strong> different languages. Similarly, <strong>in</strong><br />
the Mandar<strong>in</strong> extraction preference question structural regularities are considered as an<br />
important explanatory factor. As discussed <strong>in</strong> 1.3, the problem <strong>of</strong> theories <strong>in</strong>volv<strong>in</strong>g<br />
structural experience or canonicity is the justification <strong>of</strong> certa<strong>in</strong> granularity commitments.<br />
Chapter 3 emphasized that an implementation <strong>of</strong> the experience account <strong>in</strong><br />
a connectionist network model deals with the granularity problem <strong>in</strong> a natural way,<br />
leav<strong>in</strong>g it to the learn<strong>in</strong>g process to extract structural <strong>in</strong>formation on the granularity<br />
level that best serves the solution <strong>of</strong> the task. The network model by MacDonald and<br />
Christiansen (2002) (MC02) discussed <strong>in</strong> chapter 3 as be<strong>in</strong>g a prototypical structural<br />
experience implementation has proven to make empirically consistent predictions regard<strong>in</strong>g<br />
the subject/object difference and <strong>in</strong>dividual differences. On that basis I rebuilt the<br />
model and used it to address the above mentioned issues that were wait<strong>in</strong>g for a connectionist<br />
answer. In order to verify my implementation <strong>of</strong> the model, I present replications<br />
<strong>of</strong> the English and German simulations before report<strong>in</strong>g the new simulations. But first<br />
I will briefly <strong>in</strong>troduce the structure <strong>of</strong> the network.<br />
4.1 The Model<br />
4.1.1 Network Architecture<br />
As <strong>in</strong> MC02 the connectionist architecture used was a simple recurrent network (SRN<br />
Elman, 1990) as described <strong>in</strong> chapter 3. All networks were built, tra<strong>in</strong>ed, and tested <strong>in</strong><br />
the Tlearn simulator (Elman, 1992) on a W<strong>in</strong>dows platform. The SRN consisted <strong>of</strong> four<br />
layers.<br />
62
4.1 The Model<br />
The Input Layer In a localist <strong>in</strong>put representation the number <strong>of</strong> <strong>in</strong>put nodes depends<br />
on the number <strong>of</strong> words <strong>in</strong> the lexicon. Each <strong>in</strong>put node represents one word.<br />
The representation <strong>of</strong> a word is encoded by activat<strong>in</strong>g the node represent<strong>in</strong>g the <strong>in</strong>put<br />
word with 1 and all others 0. The MC02 replication model used 31 <strong>in</strong>put nodes for 30<br />
words and the end-<strong>of</strong>-sentence marker (EOS).<br />
The Output Layer The output layer had the same amount <strong>of</strong> units as the <strong>in</strong>put layer.<br />
Output units, however, could take activations on a cont<strong>in</strong>uous scale between zero and<br />
one. The output calculation used the cross-entropy method which guarantees that all<br />
output activations sum to one. This method makes it possible to map output activations<br />
<strong>of</strong> units directly onto cont<strong>in</strong>uation probabilities.<br />
The Hidden Layer The hidden layer holds the <strong>in</strong>ternal representations <strong>of</strong> the network.<br />
It consisted <strong>of</strong> 60 units which receive an all-to-all connection from the <strong>in</strong>put units<br />
and connect <strong>in</strong> the same way to the output units. Depend<strong>in</strong>g on the weights <strong>of</strong> the<br />
<strong>in</strong>com<strong>in</strong>g connections hidden units received activations between zero and one. Over all<br />
simulations the hidden layer size was held constant. As Christiansen and Chater (1999)<br />
demonstrated with<strong>in</strong> layer sizes from about 15 units upward the number <strong>of</strong> hidden units<br />
does not significantly <strong>in</strong>fluence the performance <strong>of</strong> the network on recursive embedd<strong>in</strong>g.<br />
Hence, for the simulations presented here the size <strong>of</strong> the hidden and context layer stayed<br />
untouched.<br />
The Context Layer The context layer conta<strong>in</strong>ed 60 units that received a one-toone<br />
connection from the hidden units. To obta<strong>in</strong> the copy mechanism the l<strong>in</strong>k weights<br />
connect<strong>in</strong>g from the hidden layer were fixed to 1. In that way, <strong>in</strong> every time step the<br />
context units received an exact copy <strong>of</strong> the hidden layer’s activations. In an SRN the<br />
back-projection <strong>in</strong>to the hidden layer happens <strong>in</strong> an all-to-all fashion, thus provid<strong>in</strong>g the<br />
next <strong>in</strong>put step with <strong>in</strong>direct context from previous calculations.<br />
4.1.2 Grammar and Corpora<br />
Simple probabilistic context-free grammars (PCFGs) were used, cover<strong>in</strong>g simple sentences<br />
and subject- and object-extracted RCs <strong>in</strong> all three languages English, German<br />
and Ch<strong>in</strong>ese. For standard English and German tra<strong>in</strong><strong>in</strong>g I used the grammars designed<br />
by Lars Konieczny (English) and Daniel Müller and Lars Konieczny (German). The<br />
relative clause distribution was adjusted across different experiments. For generat<strong>in</strong>g<br />
corpora and likelihood predictions the Simple Language Generator (SLG Rohde, 1999)<br />
was used. The three tra<strong>in</strong><strong>in</strong>g grammars as represented <strong>in</strong> SLG can be found <strong>in</strong> the Appendix.<br />
Every tra<strong>in</strong><strong>in</strong>g corpus consisted <strong>of</strong> 10,000 randomly generated sentences. Test<br />
corpora were generated for every condition consist<strong>in</strong>g <strong>of</strong> 10 test sentences each.<br />
63
4.1.3 Tra<strong>in</strong><strong>in</strong>g and Test<strong>in</strong>g<br />
Chapter 4 Two SRN Prediction Studies<br />
Prior to tra<strong>in</strong><strong>in</strong>g, all networks were <strong>in</strong>itialized with random connection weights <strong>in</strong> the<br />
range <strong>of</strong> [-0.15, 0.15] and the hidden units received an <strong>in</strong>itial bias activation <strong>of</strong> 0.5. Each<br />
tra<strong>in</strong><strong>in</strong>g <strong>in</strong>cluded 10 <strong>in</strong>dividually <strong>in</strong>itialized networks that were tra<strong>in</strong>ed on 10 different<br />
corpora, respectively. In do<strong>in</strong>g so, statistical justification was achieved by simulat<strong>in</strong>g<br />
subjects <strong>of</strong> differ<strong>in</strong>g disposition exposed to non-identical material. The networks were<br />
tra<strong>in</strong>ed for three epochs, where one epoch corresponded to a full run through a corpus.<br />
In the follow<strong>in</strong>g, the tra<strong>in</strong><strong>in</strong>g mechanism is briefly described. With every <strong>in</strong>put word<br />
the weighted connections propagate activation and <strong>in</strong>hibition through the net, form<strong>in</strong>g<br />
an activation pattern <strong>in</strong> the hidden layer which <strong>in</strong> turn is responsible for the output<br />
pattern. The error is then calculated with respect to activation <strong>of</strong> only the node correspond<strong>in</strong>g<br />
to the subsequent word <strong>in</strong> the current sentence by value 1. Thus, the network<br />
is tra<strong>in</strong>ed to determ<strong>in</strong>istically predict the next word, which is, <strong>of</strong> course, impossible<br />
to achieve. Similar <strong>in</strong>put will have different cont<strong>in</strong>uations but the teach<strong>in</strong>g mechanism<br />
claims for every cont<strong>in</strong>uation to be the one and only for the current context. Consequently,<br />
after some examples the network will activate several words, <strong>in</strong> activation<br />
strength qualitatively correspond<strong>in</strong>g to the “teacher’s” previous s<strong>in</strong>gle lessons. In comb<strong>in</strong>ation<br />
with the cross-entropy error calculation (output activations will sum to 1) the<br />
activation distribution over output nodes will be comparable to a probability distribution<br />
over words. Here the grammatical prediction error (GPE, Christiansen and Chater,<br />
1999) comes <strong>in</strong>to play. The GPE algorithm is <strong>based</strong> on the numerical differences between<br />
the desired PCFG-correspond<strong>in</strong>g probabilities and the actual output. The GPE value is<br />
a difficulty measure for every word <strong>in</strong> the sentence, which can be used as a read<strong>in</strong>g time<br />
predictor.<br />
The Grammatical Prediction Error The GPE measure as described <strong>in</strong> Christiansen<br />
and Chater (1999) assigns an error score between 0 and 1 to every output<br />
activation pattern. With all correct units receiv<strong>in</strong>g the correct amount <strong>of</strong> activation<br />
and no <strong>in</strong>correct units be<strong>in</strong>g activated the output would receive a GPE value <strong>of</strong> 0. If,<br />
on the other hand, no units are correctly activated, the GPE would be 1. The score<br />
is calculated by formula (4.1–4.5) with ui be the activation <strong>of</strong> unit i; G and U be the<br />
sets <strong>of</strong> grammatical and ungrammatical units, respectively; and ti be the desired target<br />
activation <strong>of</strong> unit i. The formula ensures that <strong>in</strong>correctly or <strong>in</strong>sufficiently activated<br />
output units get penalized by <strong>in</strong>creas<strong>in</strong>g score. S<strong>in</strong>ce all activations sum to 1, also<br />
over-activation is penalized because this too much activation is miss<strong>in</strong>g at another unit.<br />
Correctly activated units H (hits) and <strong>in</strong>correctly activated units F (false alarms) sum<br />
together to the total activation. Additionally, M is the sum <strong>of</strong> all miss<strong>in</strong>g activation mi,<br />
which is the discrepancy <strong>of</strong> an under-predicted unit’s activation ui with respect to the<br />
target activation ti. The target activation t <strong>of</strong> a unit i is given by the likelihood <strong>of</strong> the<br />
respective word <strong>in</strong> the current context str<strong>in</strong>g calculated from the probabilistic grammar.<br />
The GPE is then def<strong>in</strong>ed by the proportion <strong>of</strong> correctly activated units (H) to the total<br />
64
4.2 Replication <strong>of</strong> Previous Simulations<br />
activation and a penalty for misses (4.5).<br />
Hits (correctly activated units): H = <br />
False Alarms (<strong>in</strong>correctly activated units): F = <br />
mi =<br />
i∈Gui<br />
i∈U ui<br />
<br />
0 if ti − ui ≤ 0<br />
ti − ui otherwise<br />
Misses (units with underestimated activation): M = <br />
i∈G(H + F )mi<br />
H<br />
GP E = 1 −<br />
H + F + M<br />
(4.1)<br />
(4.2)<br />
(4.3)<br />
(4.4)<br />
(4.5)<br />
A Perl rout<strong>in</strong>e controlled tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g <strong>of</strong> the ten networks and then calculated<br />
the region-specific GPEs. The correct function<strong>in</strong>g <strong>of</strong> the process will now be validated<br />
by the replication <strong>of</strong> two previous studies.<br />
4.2 Replication <strong>of</strong> English and German RC Process<strong>in</strong>g<br />
I built the model with the parameters specified by MacDonald and Christiansen (2002)<br />
and tried to replicate their results (see figure 3.3 for MC02’s results). MC02 report an<br />
RC probability <strong>of</strong> 0.05. However, the replication fitted their data better when the RC<br />
probability was set to 0.1. Konieczny and Ruh (2003) also replicated MC02 with an RC<br />
probability <strong>of</strong> 0.1. Figure 4.1 shows the replication result. The pattern <strong>of</strong> MC02 was<br />
more exactly matched <strong>in</strong> epochs 3, 4, and 5 but the relevant <strong>in</strong>teractions were also found<br />
<strong>in</strong> epochs 1, 2, and 3. Only the tra<strong>in</strong><strong>in</strong>g effect on the ma<strong>in</strong> verb <strong>in</strong> the ORC was not<br />
very pronounced. The differences were, however, significant.<br />
I used the simplified German grammar from Konieczny and Ruh (2003) to replicate<br />
their results. Compared to the orig<strong>in</strong>al study I ga<strong>in</strong>ed lower error rates for the ma<strong>in</strong> verb<br />
<strong>in</strong> both conditions. Additionally the replication showed a significant experience effect <strong>in</strong><br />
all regions <strong>of</strong> the SRC, which was not the case <strong>in</strong> the orig<strong>in</strong>al. The pattern by region<br />
was successfully matched.<br />
I will not go <strong>in</strong>to details regard<strong>in</strong>g the two replication studies. They just build the<br />
basis for the follow<strong>in</strong>g simulations, mak<strong>in</strong>g sure that the model used here has similar<br />
properties as the models <strong>in</strong> MacDonald and Christiansen (2002) and Konieczny and Ruh<br />
(2003).<br />
65
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English SRC<br />
rep. that attacked the senator praised the judge<br />
Region<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
Chapter 4 Two SRN Prediction Studies<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English ORC<br />
rep. that the sen. attacked praised the judge<br />
Region<br />
Figure 4.1: Replication <strong>of</strong> MacDonald and Christiansen (2002)<br />
4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />
4.3.1 Simulation 1: Regularity<br />
Model Parameters<br />
The first simulation should assess the degree <strong>of</strong> the regularity advantage the ORC receives<br />
due to its canonical word order. A regularity effect is assessable only when the frequencies<br />
<strong>of</strong> both RC types <strong>in</strong> the corpora are identical. Therefore the SRC and the ORC received<br />
the same probability <strong>in</strong> the generation grammar. Although the replications were done<br />
with an RC probability <strong>of</strong> 0.1, I used the orig<strong>in</strong>al value <strong>of</strong> 0.05, reported <strong>in</strong> MacDonald<br />
and Christiansen (2002), for the Mandar<strong>in</strong> regularity simulation. Compared to English<br />
the Ch<strong>in</strong>ese grammar used here is very simple. Sett<strong>in</strong>g the RC probability too high<br />
could speed up <strong>in</strong> the learn<strong>in</strong>g process <strong>in</strong> a way that conceals tra<strong>in</strong><strong>in</strong>g effects.<br />
The Grammar The grammar used to generate the corpora covered simple regular<br />
Mandar<strong>in</strong> SVO sentences as well as SR and OR clauses. Relative clause attachment<br />
could happen at every noun with a probability <strong>of</strong> 0.05. The embedd<strong>in</strong>g depth was<br />
theoretically unlimited but with the small attachment probability <strong>of</strong> 0.05 the longest<br />
sentence <strong>in</strong> the corpora had a length <strong>of</strong> 16 words. The 17-word lexicon consisted <strong>of</strong> 9<br />
plural and s<strong>in</strong>gular nouns, three transitive and four <strong>in</strong>transitive verbs, <strong>of</strong> which one (lijie<br />
“understand”) belongs to both categories, the relativizer de and the EOS. Note however,<br />
that there is no number agreement between nouns and verbs <strong>in</strong> Mandar<strong>in</strong>. The full<br />
lexicon is given <strong>in</strong> the Appendix. Note further that <strong>in</strong> normal Mandar<strong>in</strong> <strong>in</strong>transitive<br />
66<br />
epoch 1<br />
epoch 2<br />
epoch 3
4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
German SRC<br />
der den Passant trifft verspottet<br />
Region<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
German ORC<br />
den der Passant trifft verspottet<br />
Region<br />
Figure 4.2: Replication <strong>of</strong> Konieczny and Ruh (2003)<br />
ma<strong>in</strong> clauses are closed by the marker le (see example 21). I, however, did not use it <strong>in</strong><br />
the grammar to not blur the transitivity constra<strong>in</strong>ts s<strong>in</strong>ce le could seem to the network<br />
as a noun.<br />
(21) faguan youyu le .<br />
Ten networks were tra<strong>in</strong>ed on ten randomly generated corpora. Three test corpora<br />
were randomly generated conta<strong>in</strong><strong>in</strong>g simple transitive SVO sentences, SRCs, and ORCs,<br />
respectively. The respective RC test corpora only conta<strong>in</strong>ed s<strong>in</strong>gly embedded RCs with<br />
transitive verbs. It was made sure that none <strong>of</strong> the test sentences appeared <strong>in</strong> the<br />
tra<strong>in</strong><strong>in</strong>g sets.<br />
(22) Test set examples:<br />
a. gongji fayanren de lvshi biaoyang lvshimen . (SRC)<br />
b. fayanren lijie de lvshi biaoyang y<strong>in</strong>hangjia . (ORC)<br />
The SRN’s task was to predict the next word <strong>in</strong> a sentence. For example <strong>in</strong> an SRC,<br />
when the verb biaoyang “praise” was seen and now the noun lushi “lawyer” was activated<br />
the target activation for the relativizer de is 0.975. The activation pattern is shown <strong>in</strong><br />
figure 4.3. Besides the relativizer the transitive verbs are expected to be activated. The<br />
reason is that there is a low but existent possibility <strong>of</strong> an object modify<strong>in</strong>g ORC <strong>in</strong>side<br />
the SRC (see example 23). This is <strong>in</strong> fact the only possible cont<strong>in</strong>uation apart from de<br />
follow<strong>in</strong>g a ‘V N’ sequence. Contrary to that <strong>in</strong> the ORC follow<strong>in</strong>g the first two words<br />
‘N V’ all nodes apart from the EOS are activation targets because a lot <strong>of</strong> different<br />
cont<strong>in</strong>uations are possible.<br />
67<br />
epoch 1<br />
epoch 2<br />
epoch 3
(23) [V1 [N1 V2 de ORC ] N2 de SRC ] N3<br />
Chapter 4 Two SRN Prediction Studies<br />
Rec-Act Target-Act mi Ch<strong>in</strong>. Engl.<br />
0.688 0 0 (EOS) (EOS)<br />
0.003 0.00833333 0.00533333 (biaoyang) (praise)<br />
0.000 0 0 (dadianhua) (phone)<br />
0.713 0.975 0.262 (de) (gen)<br />
0.001 0 0 (faguan) (judge)<br />
0.001 0 0 (faguanmen) (judges)<br />
0.001 0 0 (fayanren) (reporter/reporters)<br />
0.003 0.00833333 0.00533333 (gongji) (attack)<br />
0.001 0 0 (guanyuan) (senator)<br />
0.001 0 0 (guanyuanmen) (senators)<br />
0.002 0.00833333 0.00633333 (lijie) (understand)<br />
0.001 0 0 (lushi) (lawyer)<br />
0.001 0 0 (lushimen) (lawyers)<br />
0.000 0 0 (sahuang) (lie)<br />
0.001 0 0 (y<strong>in</strong>hangjia) (banker)<br />
0.001 0 0 (y<strong>in</strong>hangjiamen) (bankers)<br />
0.000 0 0 (youyu) (hesitate)<br />
H: 0.721, F: 0.697, GPE: 0.602452988750017<br />
Figure 4.3: Output node activations on the relativizer <strong>in</strong> Mandar<strong>in</strong> after the <strong>in</strong>put<br />
sequence ‘biaoyang lushi’.<br />
Predictions<br />
The aim <strong>of</strong> the first simulation was to verify the regularity (or canonicity) argument for<br />
Mandar<strong>in</strong> object relatives. As Hsiao and Gibson (2003) have stated, ORCs have a more<br />
canonical word order than SRCs and should therefore be easier. The expected outcome<br />
would be a reversed pattern to the English and German results. The performance on<br />
ORC should be relatively good and stable from the first tra<strong>in</strong><strong>in</strong>g cycle on, whereas the<br />
SRC performance should be subject to improvement throughout tra<strong>in</strong><strong>in</strong>g.<br />
Results<br />
Words 2 to 5 (N1/V1 de N2 V2 N3) were selected as regions <strong>of</strong> <strong>in</strong>terest. S<strong>in</strong>ce the<br />
assessed performance is prediction, there is no po<strong>in</strong>t <strong>in</strong> test<strong>in</strong>g the <strong>in</strong>itial word <strong>in</strong> a<br />
sentence. The network’s prediction for the <strong>in</strong>itial word will always be the same and the<br />
difference to the desired prediction will not tell us anyth<strong>in</strong>g about structural matters.<br />
68
4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />
The GPE score measured on a certa<strong>in</strong> word only tells us how the predictions <strong>based</strong><br />
on previous words fit the probabilistic grammar. It does not <strong>in</strong>clude any effect <strong>of</strong> the<br />
current word itself.<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
Mandar<strong>in</strong> SRC<br />
N1 de N2 V2 N3<br />
Region<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
Mandar<strong>in</strong> ORC<br />
V1 de N2 V2 N3<br />
Region<br />
Figure 4.4: Simulation 1: Mandar<strong>in</strong> ORC regularity.<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
See figure 4.4 for GPE scores <strong>of</strong> SRCs and ORCs by tra<strong>in</strong><strong>in</strong>g epochs. For means<br />
and standard errors see table A.1 <strong>in</strong> the appendix. Collaps<strong>in</strong>g over all regions and<br />
epochs there was a significant advantage for object relatives. The difference shrank with<br />
<strong>in</strong>creas<strong>in</strong>g epochs. For the ORC there was significant improvement on the ma<strong>in</strong> verb<br />
over the three epochs. The SRC improved on the ma<strong>in</strong> verb and the relativizer. On<br />
the first region (N1/V1) there was a marg<strong>in</strong>al advantage for the ORC <strong>in</strong> the first epoch.<br />
The second region (de) showed a significant object advantage <strong>in</strong> all epochs. There was<br />
also an object advantage on region 4 (V2), which, however, disappeared after the second<br />
epoch due to SRC improvement. Region three and five did not show any effect.<br />
The results <strong>of</strong> experiment 1 showed the predicted frequency × regularity <strong>in</strong>teraction.<br />
In contrast to the English results by MC02 the regularity effect is seen <strong>in</strong> object relatives<br />
<strong>in</strong> Mandar<strong>in</strong>. The effect, however, is is not located on the embedded RC but ma<strong>in</strong>ly on<br />
the relativizer. It seems as the predictions for position 4 (here the relativizer) are easier<br />
for the ORC because <strong>of</strong> the familiarity with the sequence ‘N V . . . ’ where the relativizer<br />
should have a quite low cont<strong>in</strong>uation probability due to the small RC frequency <strong>in</strong> the<br />
corpus. On the other hand, the SRC sequence ‘V N’ is very rarely occurr<strong>in</strong>g at the<br />
sentence beg<strong>in</strong>n<strong>in</strong>g, mak<strong>in</strong>g more tra<strong>in</strong><strong>in</strong>g necessary to learn the correct predictions.<br />
Over tra<strong>in</strong><strong>in</strong>g the network has to learn to assign a high activation to the relativizer after<br />
an ‘V N’ and to exclude almost all other words as a cont<strong>in</strong>uation.<br />
Experiment 1 superficially confirms the ORC regularity hypothesis. However, as the<br />
69
Chapter 4 Two SRN Prediction Studies<br />
corpus study by Kuo and Vasishth (2007) revealed there are a lot more structures <strong>in</strong><br />
the corpus that resemble the SRC typical ‘V N de N’ sequence. It has to be recognized<br />
that the granularity problem also accounts for connectionist networks, namely <strong>in</strong> the<br />
choice <strong>of</strong> <strong>in</strong>put structures. Assume a structure be regular with respect to ma<strong>in</strong> clauses;<br />
but then, tak<strong>in</strong>g <strong>in</strong>to account a great number <strong>of</strong> structures which fundamentally differ<br />
from the ma<strong>in</strong> clause, their word order has considerable <strong>in</strong>fluence on the respective<br />
regularity. German for example is considered to bear an SOV regularity although most<br />
ma<strong>in</strong> clauses are built with SVO. Experiment 2 will assess the qualitative impact <strong>of</strong> a<br />
displaced regularity <strong>in</strong> favor <strong>of</strong> the SRC structure.<br />
4.3.2 Simulation 2: Frequency<br />
Model Parameters<br />
Hsiao and Gibson (2003) performed a corpus study yield<strong>in</strong>g only a small difference <strong>in</strong><br />
the occurrences <strong>of</strong> subject vs. object relatives. Of all RCs they found about 57.5% were<br />
SRCs and 42.5% were ORCs. However, as reported <strong>in</strong> 2.2 there are more constructions<br />
with an RC-like pattern that are <strong>in</strong> fact not RCs. The purely syntactic account <strong>of</strong> structural<br />
frequency addressed here does not dist<strong>in</strong>guish between homomorphic sequences<br />
yield<strong>in</strong>g different <strong>in</strong>terpretations. Consequently, the corpus frequencies <strong>of</strong> all RC-like<br />
structures will be considered <strong>in</strong> this experiment. The corpus study by Kuo and Vasishth<br />
(2007), found 639 occurrences SRC-like sequences like ‘V N1 de N2’ and only 117 <strong>of</strong> ‘N1<br />
V de N2’ (ORC). That makes a total <strong>of</strong> 756 RC-like structures, <strong>of</strong> which about 84.5%<br />
exhibit the SRC pattern. In addition the possibility <strong>of</strong> ma<strong>in</strong> clauses with elided subjects<br />
is considered as an <strong>in</strong>fluence on early pars<strong>in</strong>g decisions (Kuo and Vasishth, 2007). This<br />
could further <strong>in</strong>crease the familiarity with ‘V N . . . ’ sequences and hence yield a facilitation<br />
on the SRC. To account for a high number <strong>of</strong> SRC-like structures the probability<br />
<strong>of</strong> RC attachment <strong>in</strong> the grammar was set to 0.1 and the SRC probability was set to<br />
0.85. In that way the number <strong>of</strong> ORCs <strong>in</strong> the corpus was only slightly lowered whereas<br />
the number <strong>of</strong> SRCs <strong>in</strong>creased by about 60%. Extra ma<strong>in</strong> clauses with miss<strong>in</strong>g subjects<br />
were not added to the grammar. No further changes were made.<br />
Predictions<br />
The implemented discrepancy between ORC and SRC frequency should <strong>in</strong> pr<strong>in</strong>ciple<br />
account for the structural properties <strong>in</strong> the corpus, decreas<strong>in</strong>g the familiarity effect on<br />
ORCs. The SRC pattern should be easier predictable on the embedded noun and the<br />
relativizer. Thus, the object advantage seen <strong>in</strong> simulation 1 should decrease. The size<br />
<strong>of</strong> the effect is expected to be small because the distributional changes regard<strong>in</strong>g RC<br />
probability are just a statistical approximation <strong>of</strong> given corpus data.<br />
70
4.3 RC Extraction <strong>in</strong> Mandar<strong>in</strong><br />
Results<br />
Figure 4.5 shows the GPE scores by region and epoch for SRCs and ORCs. Table<br />
A.2 (<strong>in</strong> the appendix) shows means and standard error for the first two regions. The<br />
improvement on the ma<strong>in</strong> verb <strong>in</strong> the ORC over epochs was comparably low. There was<br />
no improvement on the SRC ma<strong>in</strong> verb. The tra<strong>in</strong><strong>in</strong>g improvement on the relativizer<br />
<strong>in</strong> SRC happened predom<strong>in</strong>antly between the first and the second epoch. An object<br />
advantage on the relativizer was only present <strong>in</strong> the first and slightly <strong>in</strong> the second epoch.<br />
The third epoch did not reveal an ORC advantage. In addition a subject preference was<br />
found on the pre-relativizer region <strong>in</strong> the second and third epoch (P < 0.001).<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
Mandar<strong>in</strong> SRC (85%)<br />
N1 de N2 V2 N3<br />
Region<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
Mandar<strong>in</strong> ORC (25%)<br />
V1 de N2 V2 N3<br />
Region<br />
Figure 4.5: Simulation 2: Mandar<strong>in</strong> SRC Frequency<br />
epoch 1<br />
epoch 2<br />
epoch 3<br />
The heavily improved predictions for the relativizer <strong>in</strong> the SRC imply an <strong>in</strong>creased<br />
familiarity effect due to the corpus conta<strong>in</strong><strong>in</strong>g more SRCs than before. The result <strong>of</strong><br />
simulation 2 suggests that the regularity effect on object relatives is weak and can easily<br />
be affected by a slight distributional disproportion <strong>in</strong> favor <strong>of</strong> the SRC structure.<br />
4.3.3 Discussion<br />
Simulation 1 confirms the regularity advantage for Mandar<strong>in</strong> ORCs with respect to<br />
simple sentences with the greatest effect on the relativizer. However, the location <strong>of</strong> the<br />
effect is not consistent with human data. Recall that Hsiao and Gibson (2003) found an<br />
object preference on the pre-relativizer region whereas Kuo and Vasishth (2007) found a<br />
subject preference on the relativizer and the head noun. Besides the very small effect on<br />
N1/V1 <strong>in</strong> the simulation there is no region-specific consistency with empirical studies.<br />
71
Chapter 4 Two SRN Prediction Studies<br />
The impact <strong>of</strong> structural regularity is rather disconfirmed by several studies f<strong>in</strong>d<strong>in</strong>g a<br />
subject advantage on the relativizer.<br />
By chang<strong>in</strong>g the RC type proportions <strong>in</strong> favor <strong>of</strong> the SRC <strong>in</strong> simulation 2 the object<br />
advantage decreased dramatically. The RC region showed a subject advantage after two<br />
tra<strong>in</strong><strong>in</strong>g epochs. Compared to human data this f<strong>in</strong>d<strong>in</strong>g is also <strong>in</strong>consistent. In empirical<br />
studies, on the RC region only an object preference was found (Hsiao and Gibson, 2003;<br />
L<strong>in</strong> and Garnsey, 2007; Qiao and Forster, 2008). The assessment <strong>of</strong> frequency effects<br />
<strong>in</strong> simulation 2 is to be understood as a tentative approach to account for the complex<br />
<strong>in</strong>terplay <strong>of</strong> statistical constra<strong>in</strong>ts that drive learn<strong>in</strong>g. Direct predictions for sentence<br />
process<strong>in</strong>g patterns may, however, not be justified. An SRN-<strong>based</strong> regularity test like<br />
<strong>in</strong> simulation 1 is more or less straightforward as long as the structures <strong>in</strong> question are<br />
clearly def<strong>in</strong>ed. But the structural choice may not reflect the regularity relations that<br />
are really <strong>in</strong>fluenc<strong>in</strong>g skill <strong>in</strong> human readers. In order to ga<strong>in</strong> more precise predictions,<br />
further corpus <strong>in</strong>spections are necessary. For example, the exact proportion <strong>of</strong> RClike<br />
structures or elided-subject clauses with respect to the whole corpus was neglected<br />
dur<strong>in</strong>g the present study but could potentially have <strong>in</strong>fluenced the results.<br />
Note that the regularity pattern <strong>in</strong> Mandar<strong>in</strong> as revealed by the simulations is not<br />
easily comparable with the English simulation. In English there were effects <strong>of</strong> difficulty<br />
ma<strong>in</strong>ly occurr<strong>in</strong>g on the verbs. This is due to the number agreement between subject<br />
and predicate. Notably, between the verb and its direct object no agreement is necessary.<br />
This agreement pattern delivers as a side effect a sort <strong>of</strong> semantic <strong>in</strong>formation,<br />
comparable to thematic roles. Therewith, agreement gives rise to a simulation <strong>of</strong> <strong>in</strong>tegration<br />
difficulty effects, evolv<strong>in</strong>g from the need to relate verbs to their subject. I<br />
hypothesize that these “<strong>in</strong>tegration effects” are the ma<strong>in</strong> reason for the nice fit by region<br />
<strong>of</strong> human data. Mandar<strong>in</strong>, on the other hand, does not conta<strong>in</strong> specific noun-verb<br />
dependencies. In a sense, the network just needs to count nouns and verbs <strong>in</strong>stead <strong>of</strong><br />
establish<strong>in</strong>g pairwise relationships. Thus, the Mandar<strong>in</strong>-tra<strong>in</strong>ed network is not required<br />
to deal with the concept <strong>of</strong> a sentential subject. Consequently, no “<strong>in</strong>tegration difficulty”<br />
comparable to English is expected. Of course, this is not comparable to human process<strong>in</strong>g<br />
<strong>of</strong> Mandar<strong>in</strong>. Predicates and their arguments are <strong>in</strong>deed <strong>in</strong>volved <strong>in</strong> dependencies<br />
like thematic roles and other semantic relationships. It is imag<strong>in</strong>able that the miss<strong>in</strong>g <strong>of</strong><br />
specific noun-verb relationships is the reason for the absent pattern match between the<br />
Mandar<strong>in</strong> simulation and human data. An implementation <strong>of</strong> the miss<strong>in</strong>g dependencies<br />
similar to the simplified English grammar seems straightforward to test that hypothesis.<br />
The <strong>in</strong>terpretability <strong>of</strong> the result <strong>of</strong> such a simulation would, however, be questionable.<br />
A possible <strong>in</strong>terpretation <strong>of</strong> the all-over contradictory simulation results with respect<br />
to human data is that the effects observed here do <strong>in</strong> fact not reflect experience-relevant<br />
regularities. Assum<strong>in</strong>g on the other hand, that the simulation results are <strong>in</strong>terpretable as<br />
show<strong>in</strong>g regularity properties that play a role <strong>in</strong> human sentence comprehension, there<br />
are two possible <strong>in</strong>terpretations <strong>of</strong> the results: a) Assum<strong>in</strong>g regularity plays a role <strong>in</strong> the<br />
extraction preference, the fact that the regularity effect on the ORC was very weak <strong>in</strong><br />
the simulations could be one <strong>of</strong> the reasons for the <strong>in</strong>conclusive empirical results. b) On<br />
72
4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />
the other hand it is possible that regularity does not have a relevant impact on empirical<br />
studies <strong>of</strong> Mandar<strong>in</strong> extraction preferences and the explanation is left to other factors.<br />
4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />
4.4.1 The model<br />
As presented <strong>in</strong> chapter 3, the forgett<strong>in</strong>g effect <strong>in</strong> center-embedded structures was addressed<br />
<strong>in</strong> a connectionist study by Christiansen and Chater (1999). They tra<strong>in</strong>ed an<br />
SRN on right-branch<strong>in</strong>g and center-embedd<strong>in</strong>g structures and then assessed the output<br />
node activations after see<strong>in</strong>g the sequence NNNVV. The activations showed a clear 2VP<br />
preference consistent with empirical data from English speakers. The artificial language<br />
that covered center-embedd<strong>in</strong>g abba and right-branch<strong>in</strong>g aabb dependency patterns is<br />
perfectly comparable to the simple English grammar <strong>of</strong> object and subject relative<br />
clauses used by MacDonald and Christiansen (2002). Thus, it should be possible to<br />
replicate the effect with the SRNs tra<strong>in</strong>ed on the English grammar for the replication<br />
<strong>in</strong> section 4.2. In German RCs, however, no real right-branch<strong>in</strong>g occurs, given the embedded<br />
RC is always attached to its head noun. Hence, <strong>in</strong> the German grammar used<br />
<strong>in</strong> section 4.2 both ORC and SRC exhibit a center-embedd<strong>in</strong>g abba pattern. This fact<br />
could result <strong>in</strong> the SRN exposed to a German grammar be<strong>in</strong>g more tra<strong>in</strong>ed on verb-f<strong>in</strong>al<br />
center-embedd<strong>in</strong>g structures than the English counterpart result<strong>in</strong>g <strong>in</strong> different predictions<br />
for an NNNVV sequence. Suppos<strong>in</strong>g that the difference <strong>in</strong> SRC realization <strong>in</strong><br />
the corpora approximately reflects an essential word order regularity difference between<br />
German and English, the SRN predictions will shed light on the part that experience<br />
plays <strong>in</strong> the explanation for the forgett<strong>in</strong>g effect.<br />
I extended the study by Christiansen and Chater (1999) to ga<strong>in</strong> GPE values for both<br />
conditions on all regions after the miss<strong>in</strong>g verb. In order to achieve that, it was necessary<br />
to have a grammar that simulates the forgett<strong>in</strong>g effect, hence allows NNNVV sequences<br />
to be complete. Thus, <strong>in</strong> the probability table for the drop-V2 test<strong>in</strong>g corpus the column<br />
referr<strong>in</strong>g to the position <strong>of</strong> V2 was deleted. In consequence the test<strong>in</strong>g probabilities<br />
were adequate to a ‘N1 N2 N3 V3 V1’ grammar with the first verb (V3) be<strong>in</strong>g bound<br />
to N1 by number agreement and the second verb (V1) to N3. This is equivalent to<br />
forgett<strong>in</strong>g the prediction <strong>in</strong>duced by N2. The GPE for the ungrammatical conditions was<br />
calculated aga<strong>in</strong>st these drop-V2 probabilities. So, if the network is mak<strong>in</strong>g grammatical<br />
predictions, the error values for V1 and subsequent regions should be higher <strong>in</strong> the<br />
drop-V2 condition. On N1 the SRN would predict a verb <strong>in</strong> number agreement with<br />
N2. Then the network would predict another verb, but the test grammar predicts the<br />
determ<strong>in</strong>er. After this po<strong>in</strong>t the network’s predictions should be completely confused<br />
because the just observed sequence is <strong>in</strong>consistent with any structural generalizations<br />
developed dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. If the networks predictions are not too locally dependent,<br />
the predictions should be wrong for the last word (direct object <strong>of</strong> the ma<strong>in</strong> clause), too.<br />
73
Chapter 4 Two SRN Prediction Studies<br />
However, assum<strong>in</strong>g the forgett<strong>in</strong>g hypothesis the GPE values would look differently.<br />
The forgett<strong>in</strong>g hypothesis would mean for the SRN that it is unable to make correct<br />
predictions <strong>based</strong> on long distant dependencies but bases its predictions on rather locally<br />
consistent sequences. For example after see<strong>in</strong>g V3 the network only predicts one more<br />
verb because the observation <strong>of</strong> N1 is too weakly encoded <strong>in</strong> the hidden representations<br />
to <strong>in</strong>fluence the predictions. Consequently, on V1 the error for the drop-V2 condition<br />
should be lower because <strong>in</strong> the grammatical condition V1 is the third verb which is<br />
<strong>in</strong>consistent with the SRN’s predictions. The 2VP preference should cont<strong>in</strong>ue on the<br />
post-V1 regions because a locally coherent context with two verbs is easier to handle<br />
than a context <strong>of</strong> three verbs.<br />
Vasishth et al. (2008) mentioned the potential factor <strong>of</strong> comma <strong>in</strong>sertions that could<br />
serve as structural cues alert<strong>in</strong>g the reader <strong>of</strong> a miss<strong>in</strong>g verb. However, empirically it<br />
is hardly possible to separate the comma effect from word order effects. Vasishth and<br />
colleagues <strong>in</strong>deed tested English readers on comma-conta<strong>in</strong><strong>in</strong>g stimuli; but s<strong>in</strong>ce English<br />
readers are not tra<strong>in</strong>ed on commas used <strong>in</strong> this way, they are unable to draw as much<br />
<strong>in</strong>formation from comma positions as German readers do. In order to test whether the<br />
commas <strong>in</strong> fact <strong>in</strong>fluence structural predictions, the follow<strong>in</strong>g study tested SRNs tra<strong>in</strong>ed<br />
on German and English corpora both with and without commas.<br />
4.4.2 Simulation 3: English<br />
Model Parameters<br />
For the forgett<strong>in</strong>g effect simulation <strong>of</strong> English without commas (simulation 3a) no new<br />
tra<strong>in</strong><strong>in</strong>g was necessary. The SRNs tra<strong>in</strong>ed on the English corpora were tested on the<br />
grammatical and the ungrammatical condition <strong>in</strong> their state after one, two, and three<br />
epochs. For simulation 3b the English grammars for the tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g corpora<br />
were enriched with commas and the SRNs were tra<strong>in</strong>ed and tested <strong>in</strong> the usual way.<br />
3a: English without commas<br />
Accord<strong>in</strong>g to the equivalence <strong>of</strong> Christiansen and Chater’s tra<strong>in</strong><strong>in</strong>g language and the<br />
English tra<strong>in</strong><strong>in</strong>g grammar used here, the effects should be similar. In particular, the<br />
GPE values for the V1 and post-V1 regions should receive lower GPE values <strong>in</strong> the<br />
drop-V2 condition.<br />
(24) Example test sentences:<br />
a. the judge that the reporters that the senators understand praise attacked<br />
the senators . (no-drop)<br />
b. the judges that the reporters that the lawyer praised attacked the senators .<br />
(drop-V2)<br />
74
4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />
Results for 3a<br />
Similar to the experiment <strong>in</strong> Vasishth et al. (2008) the assessed regions <strong>in</strong> the simulation<br />
were the three verbs V3, V2, V1 and the post-V1 region. The V2 region conta<strong>in</strong>s<br />
is no datapo<strong>in</strong>t <strong>in</strong> the ungrammatical condition because the verb is dropped <strong>in</strong> the<br />
test<strong>in</strong>g stimuli. Figure 4.6 shows GPE values for the SRNs tra<strong>in</strong>ed and tested on the<br />
simplified English grammar without commas. The left panel shows both conditions after<br />
one tra<strong>in</strong><strong>in</strong>g epoch. On the right results after three full corpus runs are shown. The<br />
ungrammatical condition is called drop-V2 and the grammatical is <strong>in</strong>dicated by nodrop.<br />
The pattern was exactly as expected. The SRNs predicted a drop-V2 advantage<br />
on V1 and post-V1. No effect was predicted on V3 because no difference <strong>in</strong> stimuli and<br />
probability between the conditions is present at this po<strong>in</strong>t.<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English without commas (epoch 2)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English without commas (epoch 3)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
Figure 4.6: Simulation 3a: English doubly-embedded ORC. The graphic shows the<br />
GPE value on the three verbs and the subsequent region <strong>of</strong> the grammatical (nodrop)<br />
and ungrammatical (drop-V2) condition. The left panel shows GPE after two<br />
epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g, the right panel shows GPE after 3 epochs.<br />
3b: English with commas<br />
The commas serve as clause boundary markers. They appear <strong>in</strong> English SRCs subsequent<br />
to nouns only. In the ORC, on the other hand, commas appear after nouns <strong>in</strong><br />
the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the sentence and after the verbs <strong>in</strong> the end. In a double-embedded<br />
ORC there would be a comma after V3 and V2. Thus, the grammatical/ungrammatical<br />
sequence pair is not NNNVVV vs. NNNVV anymore but rather N,N,NV,V,V vs.<br />
N,N,NV,V. The comma is a category with only one token which attaches to nouns or<br />
75
Chapter 4 Two SRN Prediction Studies<br />
verbs and is not <strong>in</strong>volved <strong>in</strong> long-distant dependencies. Hence, the activation pattern<br />
represent<strong>in</strong>g it should not be too complex. In fact the learn<strong>in</strong>g <strong>of</strong> comma usage <strong>in</strong> ORCs<br />
can be scaled down to a count<strong>in</strong>g recursion problem <strong>of</strong> the pattern aabb <strong>in</strong>stead <strong>of</strong> abba.<br />
As discussed <strong>in</strong> chapter 3 count<strong>in</strong>g recursion is the easiest <strong>of</strong> the three recursion types<br />
for both humans and connectionist networks (Christiansen and Chater, 1999). Thus,<br />
it is very likely that the <strong>in</strong>clusion <strong>of</strong> commas facilitates process<strong>in</strong>g <strong>in</strong> the grammatical<br />
condition lower<strong>in</strong>g the respective GPE values.<br />
(25) English with commas:<br />
a. SRC: S1 , V2 O2 , V3 O3 , V1 O1<br />
b. ORC: S1 , S2 , S3 V3 , V2 , V1 O1<br />
(26) Example test sentences:<br />
a. the banker , that the banker , that the senators phone , understands , attacks<br />
the reporters . (no-drop)<br />
b. the lawyer , that the senator , that the judges attack , praises the judge .<br />
(drop-V2)<br />
Results for 3b<br />
See figure 4.7 for the results <strong>of</strong> simulation 3b after one (left panel) and three epochs (right<br />
panel). Compared to simulation 3a there was a global improvement for both conditions.<br />
The most dramatic improvement happened on V3, which is predicted almost without<br />
errors after three epochs. Look<strong>in</strong>g at the first epoch there was more improvement due<br />
to comma <strong>in</strong>sertion on V1 for the grammatical condition. In result the V1 error was the<br />
same <strong>in</strong> both conditions. However, after subsequent tra<strong>in</strong><strong>in</strong>g the no-drop condition did<br />
not change on V1 whereas the drop-V2 condition improved further result<strong>in</strong>g <strong>in</strong> a drop-V2<br />
preference on V1. The opposite happened on post-V1 where tra<strong>in</strong><strong>in</strong>g had affected the<br />
no-drop condition more. Here tra<strong>in</strong><strong>in</strong>g did not affect the ungrammatical condition at all.<br />
In summary, there was a comma <strong>in</strong>sertion × condition × tra<strong>in</strong><strong>in</strong>g <strong>in</strong>teraction, result<strong>in</strong>g<br />
<strong>in</strong> a drop-V2 preference after completed tra<strong>in</strong><strong>in</strong>g. The stable error on post-V1 <strong>in</strong> the<br />
drop-V2 condition can be <strong>in</strong>terpreted as a floor effect. The prediction <strong>of</strong> the determ<strong>in</strong>er<br />
and the noun is very good already with a GPE value around 0.1. It is very unlikely<br />
that the SRN learns the perfectly correct probabilities result<strong>in</strong>g <strong>in</strong> a GPE value <strong>of</strong> zero<br />
even after many epochs. Therefore, on the post-V1 region improvement by tra<strong>in</strong><strong>in</strong>g<br />
is only possible for the slightly worse grammatical condition, which is why the two<br />
conditions settle on the same error value after three epochs. In conclusion, the <strong>in</strong>sertion<br />
<strong>of</strong> commas def<strong>in</strong>itely helps to make better predictions. However, tra<strong>in</strong><strong>in</strong>g effects seem to<br />
be driven by rather local consistency, affect<strong>in</strong>g the ungrammatical condition more than<br />
the grammatical. Thus, look<strong>in</strong>g at V1 after three epochs the drop-V2 preference seems<br />
to be stable for English center-embedd<strong>in</strong>g.<br />
76
4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English with commas (epoch 2)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
English with commas (epoch 3)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
Figure 4.7: Simulation 3b: English doubly-embedded ORC with added commas. The<br />
graphic shows the GPE value on the three verbs and the subsequent region <strong>of</strong> the<br />
grammatical (no-drop) and ungrammatical (drop-V2) condition. The left panel shows<br />
GPE after two epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g, the right panel shows GPE after 3 epochs.<br />
4.4.3 Simulation 4: German<br />
Model Parameters<br />
Simulation 4a tested German center-embedd<strong>in</strong>g with commas. The already tra<strong>in</strong>ed networks<br />
from section 4.2 were used. For simulation 4b tra<strong>in</strong><strong>in</strong>g corpora created from a<br />
German grammar without commas were used. The test<strong>in</strong>g corpora were built analogically<br />
to simulation 3.<br />
4a: German with Commas<br />
The German grammar exhibits a regularity for verb-f<strong>in</strong>ality <strong>in</strong> RCs. This is different<br />
from the English grammar and should enable the SRN to dist<strong>in</strong>guish 2VP and 3VP<br />
embedd<strong>in</strong>g better than <strong>in</strong> English. As seen <strong>in</strong> the English simulation commas have<br />
a facilitat<strong>in</strong>g effect although, however, the drop-VP preference returned after further<br />
tra<strong>in</strong><strong>in</strong>g. In German, commas could have an even greater facilitat<strong>in</strong>g effect. The reason<br />
for that is that the count<strong>in</strong>g-recursion pattern aabb is not only applicable <strong>in</strong> the ORC<br />
as <strong>in</strong> English but also <strong>in</strong> the SRC because both are center-embedd<strong>in</strong>g <strong>in</strong> German. As<br />
example (27) illustrates both SRC and ORC conta<strong>in</strong> the exact same pattern <strong>of</strong> nouns,<br />
verbs, and commas. In conclusion the SRN tra<strong>in</strong>ed on the German corpus should be<br />
very skilled on center-embed<strong>in</strong>g recursion and the comma count<strong>in</strong>g-recursion and hence<br />
will have much lower error rates for the grammatical condition.<br />
77
(27) German with commas:<br />
a. SRC: S1 , O2 , O3 V3 , V2 , V1 O1<br />
b. ORC: S1 , S2 , S3 V3 , V2 , V1 O1<br />
Chapter 4 Two SRN Prediction Studies<br />
(28) Example test sentences:<br />
a. der Polizist , den der Mensch , den der Polizist verspottet , ruft , verspottet<br />
den Jungen . (no-drop)<br />
b. der Polizist , den der Junge , den der Polizist verspottet , ruft den Menschen<br />
. (drop-V2)<br />
Results <strong>of</strong> 4a<br />
The results <strong>of</strong> simulation 4a (German with commas) are shown <strong>in</strong> figure 4.8. There<br />
was a dramatic improvement compared to English on V2 and V1. Interest<strong>in</strong>gly, the<br />
comparison by conditions did not reveal any difference on the ma<strong>in</strong> verb. However,<br />
a slight grammaticality preference was found on the post-V1 region. This drop-V2<br />
preference was significant (α < 0.001).<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8<br />
German with commas (epoch 2)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8<br />
German with commas (epoch 3)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
Figure 4.8: Simulation 4a: German doubly-embedded ORC. The graphic shows the<br />
GPE value on the three verbs and the subsequent region <strong>of</strong> the grammatical (nodrop)<br />
and ungrammatical (drop-V2) condition after two and three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g.<br />
4b: German without commas<br />
S<strong>in</strong>ce simulation 3b provided evidence for a comma effect, the removal <strong>of</strong> commas should<br />
make the SRN’s predictions more error-prone. The verb-f<strong>in</strong>ality regularity <strong>in</strong> German,<br />
78
4.4 Forgett<strong>in</strong>g <strong>Effects</strong><br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
German without commas (epoch 2)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
GPE<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
German without commas (epoch 3)<br />
V3 V2 V1 post-V1<br />
Region<br />
drop-V2<br />
no-drop<br />
Figure 4.9: Simulation 4b: German doubly-embedded ORC without commas. The<br />
graphic shows the GPE value on the three verbs and the subsequent region <strong>of</strong> the<br />
grammatical (nod-rop) and ungrammatical (drop-V2) condition after two and 3 epochs<br />
<strong>of</strong> tra<strong>in</strong><strong>in</strong>g.<br />
however, should still cause better predictions for the grammatical condition <strong>in</strong> German<br />
than <strong>in</strong> English. Simulation 4b tested SRNs tra<strong>in</strong>ed on a comma-free German grammar.<br />
(29) Example test sentences:<br />
a. der Polizist den der Mensch den die Passanten treffen ruft verspottet den<br />
Jungen . (no-drop)<br />
b. der Passant den der Junge den der Polizist ruft beschimpft die Passanten .<br />
(drop-V2)<br />
Results <strong>of</strong> 4b<br />
The GPE values <strong>of</strong> the simulation <strong>in</strong>volv<strong>in</strong>g German without commas (figure 4.9) show<br />
a similar pattern as <strong>in</strong> English without commas. In the first epoch a drop-V2 preference<br />
was found <strong>in</strong> a small effect on V1 and a very pronounced effect on the region after. After<br />
completed tra<strong>in</strong><strong>in</strong>g, V1 and post-V1 show a similar sized drop-V2 advantage. Look<strong>in</strong>g<br />
at the data, the comma aspect seems to be the most relevant one. Surpris<strong>in</strong>gly the<br />
regularity <strong>of</strong> verb-f<strong>in</strong>al structures does not seem to support correct predictions <strong>in</strong> German<br />
more than <strong>in</strong> English. Rather the more regular application <strong>of</strong> commas <strong>in</strong> German has<br />
an extensively facilitat<strong>in</strong>g effect on both conditions, slightly more on the grammatical.<br />
The impact <strong>of</strong> comma usage and a comparison <strong>of</strong> the results to human data will be the<br />
topic <strong>of</strong> the next section.<br />
79
4.4.4 Discussion<br />
Chapter 4 Two SRN Prediction Studies<br />
The results <strong>of</strong> simulation 3a (English without commas) and 4a (German with commas)<br />
were consistent with empirical studies (Gibson and Thomas, 1999; Christiansen and<br />
MacDonald, 1999; Vasishth et al., 2008), suggest<strong>in</strong>g a difference <strong>in</strong> forgett<strong>in</strong>g behavior<br />
between German and English. A reliable grammaticality preference <strong>in</strong> German as<br />
observed <strong>in</strong> Vasishth et al. (2008) could, however, not be replicated. Simulation 3a is perfectly<br />
consistent with the simulation <strong>of</strong> the forgett<strong>in</strong>g effect <strong>in</strong> Christiansen and Chater<br />
(1999) and the human data from Vasishth et al. The results prove that not only the<br />
limited grammar used <strong>in</strong> Christiansen and Chater (1999) predicts the forgett<strong>in</strong>g effect<br />
but also the more complex grammar used here. The <strong>in</strong>herent architectural constra<strong>in</strong>ts<br />
<strong>of</strong> SRNs predict a forgett<strong>in</strong>g effect <strong>in</strong> English doubly embedded ORCs. The <strong>in</strong>sertion <strong>of</strong><br />
commas <strong>in</strong> simulation 3b had an effect on the predictions, whereas <strong>in</strong> contrast, the study<br />
by Vasishth and colleagues showed no effect. The miss<strong>in</strong>g effect <strong>of</strong> commas <strong>in</strong> their study<br />
could be expla<strong>in</strong>ed by the fact that English readers are not familiar with us<strong>in</strong>g commas<br />
<strong>in</strong> that way. On the other hand, the SRNs were tra<strong>in</strong>ed on corpora conta<strong>in</strong><strong>in</strong>g commas<br />
and, thus, had learned how to use them as structural cues. The rema<strong>in</strong><strong>in</strong>g drop-V2<br />
preference on V1 still shows a certa<strong>in</strong> consistency between the simulation and empirical<br />
data. More importantly, the model makes different predictions <strong>in</strong> German. Simulation<br />
4a shows a similar performance on grammatical and ungrammatical sentences. In comparison,<br />
the Vasishth et al. study found <strong>in</strong> fact faster read<strong>in</strong>g times <strong>in</strong> the grammatical<br />
condition for German readers. This is not predicted by the model, but it is the difference<br />
to English that is important here. Surpris<strong>in</strong>gly, the SRNs tra<strong>in</strong>ed on German without<br />
commas performed no better than the SRNs tra<strong>in</strong>ed on English without commas. This<br />
yields a comma × language <strong>in</strong>teraction. The greater effect <strong>of</strong> commas <strong>in</strong> German is<br />
expla<strong>in</strong>able by the different comma patterns <strong>in</strong> both languages (see examples 25 and<br />
27). So, the different predictions regard<strong>in</strong>g the forgett<strong>in</strong>g effect seem to be caused only<br />
<strong>in</strong>directly by word order regularities. In particular, the consistent center-embedd<strong>in</strong>g (or<br />
count<strong>in</strong>g-recursion) <strong>of</strong> commas <strong>in</strong> German makes them a reliable predictor, whereas this<br />
is not the case <strong>in</strong> English. The word order itself, however, did not have the expected<br />
effect, as experiment 4b shows. Controll<strong>in</strong>g for the comma effect the head-f<strong>in</strong>iteness <strong>of</strong><br />
SRCs <strong>in</strong> the simplified German grammar does not <strong>in</strong>crease the performance on double<br />
center-embedd<strong>in</strong>g on V1 and post-V1. Maybe doubly-embedded RCs are too rare <strong>in</strong><br />
the corpus to cause an effect. Another explanation could be the particularly <strong>in</strong>creased<br />
complexity <strong>in</strong> the prediction <strong>of</strong> German embedded RCs. In a German RC the cues establish<strong>in</strong>g<br />
the agreement <strong>of</strong> the embedded verb are very subtle compared to English.<br />
In English the word order <strong>of</strong> ‘who NP VP’ versus ‘who VP NP’ decides the agreement,<br />
whereas <strong>in</strong> German there are several possible pair<strong>in</strong>gs <strong>of</strong> der, den, and die, that determ<strong>in</strong>e<br />
the verb agreement. Thus, the verb agreement prediction requires a complex<br />
representation <strong>of</strong> previous <strong>in</strong>put. Given the architectural limits <strong>of</strong> the network, highly<br />
complex representations are <strong>in</strong> a trade<strong>of</strong>f with memory span. That means that distant<br />
dependencies and verb predictions are very hard to ma<strong>in</strong>ta<strong>in</strong>. In other words the trace<br />
80
4.5 Conclusion<br />
<strong>of</strong> the VP-predict<strong>in</strong>g NP <strong>in</strong> the representational cycle <strong>of</strong> the SRN decays faster. This <strong>in</strong><br />
turn is compensated by <strong>in</strong>creased tra<strong>in</strong><strong>in</strong>g on center-embedd<strong>in</strong>g compared to the English<br />
simulation, result<strong>in</strong>g <strong>in</strong> comparable error values when tra<strong>in</strong>ed without commas. This<br />
is, <strong>of</strong> course, an ad-hoc hypothesis and needs further <strong>in</strong>vestigation, which is beyond the<br />
scope <strong>of</strong> this thesis.<br />
4.5 Conclusion<br />
This thesis <strong>in</strong>vestigated the explanatory power <strong>of</strong> a certa<strong>in</strong> implementation <strong>of</strong> the experience<br />
account. The well approved SRN model<strong>in</strong>g approach <strong>of</strong> MacDonald and Christiansen<br />
(2002) was adopted to test its predictions on two phenomena currently discussed<br />
<strong>in</strong> literature. The RC extraction type preference <strong>in</strong> Mandar<strong>in</strong> and the forgett<strong>in</strong>g effect<br />
<strong>in</strong> complex center-embedd<strong>in</strong>g was discussed and then modeled. At first, the two problems<br />
were approached theoretically, review<strong>in</strong>g results <strong>of</strong> empirical studies and discuss<strong>in</strong>g<br />
potential predictions <strong>of</strong> available theories. Concern<strong>in</strong>g the Mandar<strong>in</strong> relative clauses,<br />
the studies showed exceptionally mixed results. However, an observed object advantage<br />
appeared always on the RC region, whereas a subject advantage was found only on the<br />
relativizer/head noun region. That fact and the experiment by Qiao and Forster (2008)<br />
are suggest<strong>in</strong>g that Ch<strong>in</strong>ese Mandar<strong>in</strong> might have to be counted as an exception to a<br />
universal subject preference. On the other hand, the results for the forgett<strong>in</strong>g effect were<br />
very clear and best expla<strong>in</strong>ed by language-specific experience.<br />
In chapter 3 the simple recurrent network was <strong>in</strong>troduced and its properties were<br />
discussed. An SRN is a very simple and doma<strong>in</strong>-unspecific model, but it accounts for<br />
the three necessities <strong>in</strong>troduced <strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> this thesis: a) biological factors<br />
(architectural limits), b) cont<strong>in</strong>uous gradedness <strong>in</strong> performance, and c) experience.<br />
In chapter 4 the experience theory predictions regard<strong>in</strong>g the two sample problems were<br />
practically assessed. Just like <strong>in</strong> the discussion and critique <strong>of</strong> MC02, the simulation<br />
results presented here looked promis<strong>in</strong>g on first sight; but sub-experiments and detailed<br />
data analysis revealed considerable <strong>in</strong>consistencies with respect to human data. The<br />
Mandar<strong>in</strong> RC simulation predicted an object preference, but the location <strong>of</strong> the effect<br />
was not consistent with human data. In addition, the second simulation demonstrated<br />
that the regularity effect was very weak. It becomes clear that the tra<strong>in</strong><strong>in</strong>g material used<br />
must be carefully chosen <strong>in</strong> order to guarantee comparability with other simulations and<br />
empirical studies. The forgett<strong>in</strong>g effect was predicted to be present <strong>in</strong> English but not <strong>in</strong><br />
German, consistent with human data. However, further simulations revealed the comma<br />
<strong>in</strong>sertion as the most important factor.<br />
Of course, it has to be clear that the simple network tra<strong>in</strong>ed on a simple grammar<br />
would not learn the same constra<strong>in</strong>ts as humans do. These simulations are rather approximations<br />
po<strong>in</strong>t<strong>in</strong>g to a certa<strong>in</strong> direction. A noticeable problem <strong>of</strong> the SRN predictions<br />
is their dependency on local coherence, which can also be described as a low memory<br />
span. This is, however, ma<strong>in</strong>ly dependent upon the specific properties <strong>of</strong> the learn<strong>in</strong>g<br />
81
Chapter 4 Two SRN Prediction Studies<br />
mechanism and the context loop. As mentioned <strong>in</strong> the previous chapter, there are other<br />
learn<strong>in</strong>g mechanisms that can <strong>in</strong>crease the span; although they might be cognitively very<br />
unmotivated. Interest<strong>in</strong>gly, however, there is evidence that even human readers rely on<br />
local coherence <strong>in</strong> certa<strong>in</strong> structures (Tabor et al., 2004). Another f<strong>in</strong>d<strong>in</strong>g is that the<br />
simulations reported <strong>in</strong> Christiansen and Chater (1999) and also the comma issue <strong>in</strong> simulations<br />
3 and 4 presented here showed that the SRN handles count<strong>in</strong>g-recursion better<br />
than other types. That may be the reason for the strong facilitat<strong>in</strong>g effect <strong>of</strong> comma<br />
<strong>in</strong>sertion compared to head-f<strong>in</strong>ality. Address<strong>in</strong>g this, it shall be noted that Rodriguez<br />
(2001) claims that SRNs can <strong>in</strong> fact carry out explicit symbolic count<strong>in</strong>g procedures.<br />
This work argued for a uniform account to <strong>in</strong>dividual and language-specific differences<br />
as well as language-<strong>in</strong>dependent process<strong>in</strong>g skill. All three can <strong>in</strong> considerable parts be<br />
attributed to experience with the <strong>in</strong>dividual l<strong>in</strong>guistic environment <strong>in</strong> <strong>in</strong>teraction with<br />
architectural preconditions. It can be concluded that a lot <strong>of</strong> work is necessary before<br />
f<strong>in</strong>e-gra<strong>in</strong>ed experience-<strong>based</strong> predictions can be ga<strong>in</strong>ed for the highly complex task <strong>of</strong><br />
sentence comprehension. By all means, literature shows a promis<strong>in</strong>g trend towards PDP<br />
models <strong>of</strong> language comprehension, accompanied by the <strong>in</strong>tegration <strong>of</strong> corpus analyses<br />
and acquisition data.<br />
82
Bibliography<br />
J. R. Anderson and C. Lebiere. The Atomic Components <strong>of</strong> Thought. Lawrence Erlbaum<br />
Associates, 1998.<br />
J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Q<strong>in</strong>. An<br />
<strong>in</strong>tegrated theory <strong>of</strong> the m<strong>in</strong>d. Psychological Review, New York, 111(4):1036–1060,<br />
2004.<br />
E. Bach, C. Brown, and W. Marslen-wilson. Crossed and nested dependencies <strong>in</strong> german<br />
and dutch: A psychol<strong>in</strong>guistic study. Language and Cognitive Processes, 1(4):249–262,<br />
1986.<br />
A. D. Baddeley. Human Memory: Theory and Practice. Psychology Press, 1997.<br />
R. C. Berwick and A. S. We<strong>in</strong>berg. The grammatical basis <strong>of</strong> l<strong>in</strong>guistic performance.<br />
MIT Press, 1984.<br />
T. G. Bever. The cognitive basis for l<strong>in</strong>guistic structures. Cognition and the development<br />
<strong>of</strong> language, 279, 1970.<br />
M. S. Blaubergs and M. D. S. Bra<strong>in</strong>e. Short-term memory limitations on decod<strong>in</strong>g<br />
self-embedded sentences. Journal <strong>of</strong> Experimental Psychology, 102(4):745–748, 1974.<br />
A. L. Blumenthal. Observations with self-embedded sentences. Psychonomic Science, 6<br />
(10):453–454, 1966.<br />
J. K. Bock. An effect <strong>of</strong> the accessibility <strong>of</strong> word forms on sentence structures. Journal<br />
<strong>of</strong> Memory and Language, 26(2):119–137, 1987.<br />
D. Caplan, S. Vijayan, G. Kuperberg, C. West, G. Waters, D. Greve, and A. M.<br />
Dale. Vascular responses to syntactic process<strong>in</strong>g: Event-related fMRI study <strong>of</strong> relative<br />
clauses. Human Bra<strong>in</strong> Mapp<strong>in</strong>g, 15(1):26–38, 2002.<br />
N. Chomsky. Lectures on government and b<strong>in</strong>d<strong>in</strong>g the Pisa lectures Studies <strong>in</strong> generative<br />
grammar. Foris publications, 1981.<br />
N. Chomsky. Aspects <strong>of</strong> the Theory <strong>of</strong> Syntax. MIT Press, Cambridge, 1965.<br />
N. Chomsky. Syntactic Structures. Mouton, The Hague, 1957.<br />
VIII
Bibliography<br />
M. H. Christiansen. The (non) necessity <strong>of</strong> recursion <strong>in</strong> natural language. In Proceed<strong>in</strong>gs<br />
<strong>of</strong> the Fourteenth Annual Conference <strong>of</strong> the Cognitive Science Society. Lawrence<br />
Erlbaum Associates, 1992.<br />
M. H. Christiansen and N. Chater. Toward a connectionist model <strong>of</strong> recursion <strong>in</strong> human<br />
l<strong>in</strong>guistic performance. Cognitive Science, 23(2):157–205, 1999.<br />
M. H. Christiansen and M. C. MacDonald. Process<strong>in</strong>g <strong>of</strong> recursive sentence structure:<br />
Test<strong>in</strong>g predictions from a connectionist model. Manuscript <strong>in</strong> preparation, 1999.<br />
L. Cohen and J. Mehler. Click monitor<strong>in</strong>g revisited: An on-l<strong>in</strong>e study <strong>of</strong> sentence<br />
comprehension. Memory & cognition, 24(1):94–102, 1996.<br />
M. Corley and S. Corley. Cross-l<strong>in</strong>guistic and <strong>in</strong>tra-l<strong>in</strong>guistic evidence for the use <strong>of</strong><br />
statistics <strong>in</strong> human sentence process<strong>in</strong>g. Unpublished manuscript, University <strong>of</strong> Exeter,<br />
1995.<br />
F. Cuetos, D. Mitchell, and M. Corley. Pars<strong>in</strong>g <strong>in</strong> different languages. Language process<strong>in</strong>g<br />
<strong>in</strong> Spanish, pages 145–187, 1996.<br />
M. Daneman and P. A. Carpenter. Individual differences <strong>in</strong> work<strong>in</strong>g memory and read<strong>in</strong>g.<br />
Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 19(4):450–66, 1980.<br />
G. S. Dell and P. G. O’Seaghdha. Stages <strong>of</strong> lexical access <strong>in</strong> language production.<br />
Cognition, 42(1-3):287–314, 1992.<br />
G. S. Dell, L. K. Burger, and W. R. Svec. Language production and serial order: A<br />
functional analysis and a model. Cognitive <strong>Model<strong>in</strong>g</strong>, 2002.<br />
T. Desmet and E. Gibson. Disambiguation preferences and corpus frequencies <strong>in</strong> noun<br />
phrase conjunction. Journal <strong>of</strong> Memory and Language, 49(3):353–374, 2003.<br />
J. A. V. Dyke and R. L. Lewis. Dist<strong>in</strong>guish<strong>in</strong>g effects <strong>of</strong> structure and decay on attachment<br />
and repair: A cue-<strong>based</strong> pars<strong>in</strong>g account <strong>of</strong> recovery from misanalyzed ambiguities.<br />
Journal <strong>of</strong> Memory and Language, 49(3):285–316, 2003.<br />
J. A. V. Dyke and B. McElree. Retrieval <strong>in</strong>terference <strong>in</strong> sentence comprehension. Journal<br />
<strong>of</strong> Memory and Language, 55(2):157–166, 2006.<br />
S. F. Ehrlich and K. Rayner. Contextual effects on word perception and eye movements<br />
dur<strong>in</strong>g read<strong>in</strong>g. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 20:641–655, 1981.<br />
J. L. Elman. Tlearn simulator.<br />
S<strong>of</strong>tware available at: http://crl.ucsd.edu/<strong>in</strong>nate/tlearn.html, 1992.<br />
J. L. Elman. F<strong>in</strong>d<strong>in</strong>g structure <strong>in</strong> time. Cognitive Science, 14(2):179–211, 1990.<br />
IX
Bibliography<br />
P. Erdmann. Ist das deutsche e<strong>in</strong>e sov-sprache? Zeitschrift für deutsche Sprache und<br />
Literatur, 1990.<br />
F. Ferreira. The mis<strong>in</strong>terpretation <strong>of</strong> noncanonical sentences. Cognitive Psychology, 47<br />
(2):164–203, 2003.<br />
M. Ford. A method for obta<strong>in</strong><strong>in</strong>g measures <strong>of</strong> local pars<strong>in</strong>g complexity throughout<br />
sentences. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 22:203–218, 1983.<br />
K. I. Forster, C. Guerrera, and L. Elliot. The maze task: measur<strong>in</strong>g forced <strong>in</strong>cremental<br />
sentence process<strong>in</strong>g time. Manuscript submitted for publication, 2008.<br />
U. Frauenfelder and J. Segui. Monitor<strong>in</strong>g around the relative clause. Journal <strong>of</strong> Verbal<br />
Learn<strong>in</strong>g and Verbal Behavior, pages 19.328–37, 1980.<br />
L. Frazier. Syntactic process<strong>in</strong>g: Evidence from dutch. Natural language and l<strong>in</strong>guistic<br />
theory, 5(4):519–559, 1987.<br />
L. Frazier. On comprehend<strong>in</strong>g sentences: Synactic pars<strong>in</strong>g strategies. PhD thesis, University<br />
<strong>of</strong> Connecticut, West Bend, IN., 1979.<br />
L. Frazier. Syntactic complexity. In D. R. Dowty, L. Kartunnen, and A. M. Zwicky,<br />
editors, Natural language pars<strong>in</strong>g: Psychological, computational, and theoretical perspectives,<br />
pages 129–189. Cambridge University Press, 1985.<br />
L. Frazier and C. Clifton. Successive cyclicity <strong>in</strong> the grammar and the parser. Language<br />
and Cognitive Processes, 4:93–126, 1989.<br />
L. Frazier and C. Clifton. Construal. MIT Press, 1996.<br />
L. Frazier and G. B. Flores d’Arcais. Filler-driven pars<strong>in</strong>g: a study <strong>of</strong> gap fill<strong>in</strong>g <strong>in</strong><br />
dutch. Journal <strong>of</strong> Memory and Language, 28:331–344, 1989.<br />
L. Frazier, C. Clifton, and J. Randall. Fill<strong>in</strong>g gaps: Decision pr<strong>in</strong>ciples and structure <strong>in</strong><br />
sentence comprehension. Cognition, 13(2):187–222, 1983.<br />
E. Gibson. The dependency locality theory: A distance-<strong>based</strong> theory <strong>of</strong> l<strong>in</strong>guistic complexity.<br />
In Image, language, bra<strong>in</strong>: Papers from the first m<strong>in</strong>d articulation project<br />
symposium, pages 95–126, 2000.<br />
E. Gibson. L<strong>in</strong>guistic complexity: locality <strong>of</strong> syntactic dependencies. Cognition, 68(1):<br />
1–76, 1998.<br />
E. Gibson and C. T. Schütze. Disambiguation preferences <strong>in</strong> noun phrase conjunction<br />
do not mirror corpus frequency. Journal <strong>of</strong> Memory and Language, 40(2):263–279,<br />
1999.<br />
X
Bibliography<br />
E. Gibson and J. Thomas. Memory limitations and structural forgett<strong>in</strong>g: The perception<br />
<strong>of</strong> complex ungrammatical sentences as grammatical. Language and Cognitive<br />
Processes, 14(3):225–248, 1999.<br />
E. Gibson, T. Desmet, D. Grodner, D. Watson, and K. Ko. Read<strong>in</strong>g relative clauses <strong>in</strong><br />
english. Cognitive L<strong>in</strong>guistics, 16(2):313–353, 2005a.<br />
E. Gibson, K. Nakatani, and E. Chen. Dist<strong>in</strong>guish<strong>in</strong>g theories <strong>of</strong> syntactic storage cost<br />
<strong>in</strong> sentence comprehension: Evidence from japanese. To appear, 2005b.<br />
P. C. Gordon, R. Hendrick, and M. Johnson. Memory <strong>in</strong>terference dur<strong>in</strong>g language<br />
process<strong>in</strong>g. Journal <strong>of</strong> Experimenatl Psychology, Learn<strong>in</strong>g, Memory, and Cognition,<br />
27(6):1411–1423, 2001.<br />
P. C. Gordon, R. Hendrick, and W. H. Lev<strong>in</strong>e. Memory-Load <strong>in</strong>terference <strong>in</strong> syntactic<br />
process<strong>in</strong>g. Psychological Science, 13(5):425–430, 2002.<br />
P. C. Gordon, R. Hendrick, and M. Johnson. <strong>Effects</strong> <strong>of</strong> noun phrase type on sentence<br />
complexity. Journal <strong>of</strong> Memory and Language, 51(1):97–114, 2004.<br />
P. C. Gordon, R. Hendrick, M. Johnson, and Y. Lee. Similarity-Based <strong>in</strong>terference<br />
dur<strong>in</strong>g language comprehension: Evidence from eye track<strong>in</strong>g dur<strong>in</strong>g read<strong>in</strong>g. Journal<br />
<strong>of</strong> experimental psychology. Learn<strong>in</strong>g, memory, and cognition, 32(6):1304, 2006.<br />
A. Gouvea. Process<strong>in</strong>g Syntactic Complexity: Cross-l<strong>in</strong>guistic Differences and ERP<br />
Evidence. PhD thesis, University <strong>of</strong> Maryland, College Park, 2003.<br />
J. H. Greenberg. Some universals <strong>of</strong> grammar with particular reference to the order<br />
<strong>of</strong> mean<strong>in</strong>gful elements. In J. H. Greenberg, editor, Universals <strong>of</strong> Language, pages<br />
73–113. MIT Press, London, 1963.<br />
D. Grodner and E. Gibson. Consequences <strong>of</strong> the serial nature <strong>of</strong> l<strong>in</strong>guistic <strong>in</strong>put for<br />
sentenial complexity. Cognitive Science, 29(2):261–290, 2005.<br />
J. Hale. A probabilistic earley parser as a psychol<strong>in</strong>guistic model. North American<br />
Chapter Of The Association For Computational L<strong>in</strong>guistics, pages 1–8, 2001.<br />
B. Hemforth, L. Konieczny, and C. Scheepers. Syntactic attachment and anaphor resolution:<br />
Two sides <strong>of</strong> relative clause attachment. Architectures and mechanisms for<br />
language process<strong>in</strong>g, pages 259–281, 2000.<br />
R. N. A. Henson. Short-Term memory for serial order: The Start-End model. Cognitive<br />
Psychology, 36(2):73–137, 1998.<br />
V. M. Holmes and J. K. O’Regan. Eye fixation patterns dur<strong>in</strong>g the read<strong>in</strong>g <strong>of</strong> relative<br />
clause sentences. Journal <strong>of</strong> Verbal Learn<strong>in</strong>g and Verbal Behavior, 20(4):1, 1981.<br />
XI
Bibliography<br />
F. Hsiao and E. Gibson. Process<strong>in</strong>g relative clauses <strong>in</strong> ch<strong>in</strong>ese. Cognition, 90(1):3–27,<br />
2003.<br />
C.-C. Hsu, F. Hurewitz, and C. Phillips. Contextual and syntactic cues for head-f<strong>in</strong>al<br />
relative clauses <strong>in</strong> ch<strong>in</strong>ese. In The 19th Annual CUNY Conference on Human <strong>Sentence</strong><br />
Process<strong>in</strong>g, NY, 2006.<br />
T. Ishizuka, K. Nakatani, and E. Gibson. Process<strong>in</strong>g japanese relative clauses <strong>in</strong> context.<br />
In The 19th Annual CUNY Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g, CUNY<br />
Graduate Canter, NY, 2006.<br />
F. T. Jaeger, E. Fedorenko, P. H<strong>of</strong>meister, and E. Gibson. Expectation-<strong>based</strong> syntactic<br />
process<strong>in</strong>g: Antilocality outside <strong>of</strong> head-f<strong>in</strong>al languages. CUNY <strong>Sentence</strong> Process<strong>in</strong>g<br />
Conference, North Carol<strong>in</strong>a, 2008.<br />
C. Juliano and M. K. Tanenhaus. A constra<strong>in</strong>t-<strong>based</strong> lexicalist account <strong>of</strong> the subject/object<br />
attachment preference. Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research, 23(6):459–471,<br />
1994.<br />
D. Jurafsky. A probabilistic model <strong>of</strong> lexical and syntactic access and disambiguation.<br />
Cognitive Science, 20(2):137–194, 1996.<br />
M. A. Just and P. A. Carpenter. A capacity theory <strong>of</strong> comprehension: <strong>in</strong>dividual differences<br />
<strong>in</strong> work<strong>in</strong>g memory. Psychological Review, 99(1):122–49, 1992.<br />
M. A. Just and P. A. Carpenter. A capacity theory <strong>of</strong> comprehension: Individual differences<br />
<strong>in</strong> work<strong>in</strong>g memory. Cognitive <strong>Model<strong>in</strong>g</strong>, pages 131–177, 2002.<br />
M. A. Just and S. Varma. A hybrid architecture for work<strong>in</strong>g memory: Reply to macdonald<br />
and christiansen (2002). Psychological Review, 109(1):55–65, 2002.<br />
M. A. Just, P. A. Carpenter, and J. D. Woolley. Paradigms and processes <strong>in</strong> read<strong>in</strong>g<br />
comprehension. Journal <strong>of</strong> Experimental Psychology: General, 111(2):228–238, 1982.<br />
M. A. Just, P. A. Carpenter, T. A. Keller, W. F. Eddy, and K. R. Thulborn. Bra<strong>in</strong><br />
activation modulated by sentence comprehension. Science, 274(5284):114, 1996.<br />
A. Kawamoto. Nonl<strong>in</strong>ear dynamics <strong>in</strong> the resolution <strong>of</strong> lexical ambiguity: a parallel<br />
distributed process<strong>in</strong>g account. Journal <strong>of</strong> memory and language(Pr<strong>in</strong>t), 32(4):474–<br />
516, 1993.<br />
E. L. Keenan and B. Comrie. Noun phrase accessibility and universal grammar. L<strong>in</strong>guistic<br />
Inquiry, pages 63–99, 1977.<br />
J. K<strong>in</strong>g and M. A. Just. Individual differences <strong>in</strong> syntactic process<strong>in</strong>g: The role <strong>of</strong><br />
work<strong>in</strong>g memory. Journal <strong>of</strong> Memory and Language, 30(5):580–602, 1991.<br />
XII
Bibliography<br />
J. W. K<strong>in</strong>g and M. Kutas. Who did what and when? us<strong>in</strong>g word-and Clause-Level ERPs<br />
to monitor work<strong>in</strong>g memory usage <strong>in</strong> read<strong>in</strong>g. Journal <strong>of</strong> Cognitive Neuroscience, 7<br />
(3):376–395, 1995.<br />
L. Konieczny. Locality and pars<strong>in</strong>g complexity. Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research,<br />
29(6):627–645, 2000.<br />
L. Konieczny and N. Ruh. What’s <strong>in</strong> an error? a reply to macdonald and christiansen<br />
(2002). Manuscript submitted, University <strong>of</strong> Freiburg, 2003.<br />
K. Kuo and S. Vasishth. Process<strong>in</strong>g ch<strong>in</strong>ese relative clauses: Evidence for the universal<br />
subject preference. Manuscript submitted, 2007.<br />
N. Kwon, M. Pol<strong>in</strong>sky, and R. Kluender. Process<strong>in</strong>g <strong>of</strong> relative clause sentences <strong>in</strong><br />
korean. Poster presented at AMLaP Conference, 2004.<br />
W. Lark<strong>in</strong> and D. Burns. <strong>Sentence</strong> comprehension and memory for embedded structure.<br />
Memory and Cognition, 5(1):17–22, 1977.<br />
R. Levy. Expectation-<strong>based</strong> syntactic comprehension. Cognition, 106(3):1126–1177,<br />
2008.<br />
R. Lewis. A theory <strong>of</strong> grammatical but unacceptable embedd<strong>in</strong>gs. Journal <strong>of</strong> Psychol<strong>in</strong>guistic<br />
Research, 25:93–116, 1996.<br />
R. L. Lewis and S. Vasishth. An activation-<strong>based</strong> model <strong>of</strong> sentence process<strong>in</strong>g as skilled<br />
memory retrieval. Cognitive Science, 29:1–45, May 2005.<br />
R. L. Lewis, S. Vasishth, and J. Van Dyke. Computational pr<strong>in</strong>ciples <strong>of</strong> work<strong>in</strong>g memory<br />
<strong>in</strong> sentence comprehension. Trends <strong>in</strong> Cognitive Sciences, 10(10):447–454, October<br />
2006.<br />
C. C. L<strong>in</strong>. The psychological reality <strong>of</strong> head-f<strong>in</strong>al relative clauses. Paper presented at<br />
the International Workshop on Relative Clauses, Academia S<strong>in</strong>ica, Taipei, 2007.<br />
C. C. L<strong>in</strong> and T. G. Bever. Ch<strong>in</strong>ese is no exception: Universal subject preference <strong>of</strong><br />
relative clause process<strong>in</strong>g. Paper presented at The 19th Annual CUNY Conference on<br />
Human <strong>Sentence</strong> Process<strong>in</strong>g, CUNY Graduate Center, New York, NY, 2006a.<br />
C. C. L<strong>in</strong> and T. G. Bever. Subject preference <strong>in</strong> the process<strong>in</strong>g <strong>of</strong> relative clauses <strong>in</strong><br />
ch<strong>in</strong>ese. In D. D. B. Donald Baumer, D. Montero, and M. Scanlon, editors, Proceed<strong>in</strong>gs<br />
<strong>of</strong> the 25th West Coast Conference on Formal L<strong>in</strong>guistics. Somerville, MA: Cascadilla<br />
Proceed<strong>in</strong>gs Project, pages 254–260, 2006b.<br />
C.-J. C. L<strong>in</strong> and T. G. Bever. Process<strong>in</strong>g head-f<strong>in</strong>al relative clauses without garden<br />
paths. Paper presented at the International Conference on Process<strong>in</strong>g Head-F<strong>in</strong>al<br />
Structures, Rochester Institute <strong>of</strong> Technology, Rochester, NY, September 21-22, 2007.<br />
XIII
Bibliography<br />
C. J. C. L<strong>in</strong>, S. Fong, and T. G. Bever. Construct<strong>in</strong>g filler-gap dependencies <strong>in</strong> ch<strong>in</strong>ese<br />
possessor relative clauses. In Proceed<strong>in</strong>gs <strong>of</strong> PACLIC, 2005.<br />
Y. L<strong>in</strong> and S. Garnsey. Plausibility and the resolution <strong>of</strong> temporary ambiguity <strong>in</strong> relative<br />
clause comprehension <strong>in</strong> mandar<strong>in</strong>. In Proceed<strong>in</strong>gs <strong>of</strong> the 20th Annual CUNY<br />
Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g, 2007.<br />
M. C. MacDonald and M. H. Christiansen. Reassess<strong>in</strong>g work<strong>in</strong>g memory: Comment on<br />
Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109<br />
(1):35–54, 2002.<br />
M. C. MacDonald, N. J. Pearlmutter, and M. S. Seidenberg. Lexical nature <strong>of</strong> syntactic<br />
ambiguity resolution. Psychological Review, 101(4):676–703, October 1994.<br />
B. MacWh<strong>in</strong>ney. Start<strong>in</strong>g po<strong>in</strong>ts. Language, 53(1):152–168, 1977.<br />
B. MacWh<strong>in</strong>ney. Basic syntactic processes. Language development, 1:73–136, 1982.<br />
B. MacWh<strong>in</strong>ney and C. Pleh. The process<strong>in</strong>g <strong>of</strong> restrictive relative clauses <strong>in</strong> hungarian.<br />
Cognition, 29(2):95–141, 1988.<br />
W. M. Mak, W. Vonk, and H. Schriefers. The <strong>in</strong>fluence <strong>of</strong> animacy on relative clause<br />
process<strong>in</strong>g. Journal <strong>of</strong> Memory and Language, 47(1):50–68, 2002.<br />
Y. Matsumoto. Noun-modify<strong>in</strong>g Constructions <strong>in</strong> Japanese: A Frame-semantic Approach.<br />
John Benjam<strong>in</strong>s, 1997.<br />
J. L. McClelland and J. L. Elman. The TRACE Model <strong>of</strong> Speech Perception. California<br />
University San Diego, La Jolla Center for Research <strong>in</strong> Language, 1984.<br />
A. Meckl<strong>in</strong>ger, H. Schriefers, K. Ste<strong>in</strong>hauer, and A. D. Friederici. Process<strong>in</strong>g relative<br />
clauses vary<strong>in</strong>g on syntactic and semantic dimensions: an analysis with event-related<br />
potentials. Memory & Cognition, 23(4):477–94, 1995.<br />
D. C. Mitchell, F. Cuetos, M. M. B. Corley, and M. Brysbaert. Exposure-<strong>based</strong> models <strong>of</strong><br />
human pars<strong>in</strong>g: Evidence for the use <strong>of</strong> coarse-gra<strong>in</strong>ed (nonlexical) statistical records.<br />
Journal <strong>of</strong> Psychol<strong>in</strong>guistic Research, 24(6):469–488, 1995.<br />
E. Miyamoto and M. Nakamura. Subject/object asymmetries <strong>in</strong> the process<strong>in</strong>g <strong>of</strong> relative<br />
clauses <strong>in</strong> Japanese. In Proceed<strong>in</strong>gs <strong>of</strong> WCCFL, volume 22, pages 342–355, 2003.<br />
X. Qiao and K. I. Forster. Object relatives ARE easier than subject relatives <strong>in</strong> ch<strong>in</strong>ese.<br />
In Proceed<strong>in</strong>gs <strong>of</strong> AMLaP Conference, 2008.<br />
D. S. Race and M. C. MacDonald. The use <strong>of</strong> “that” <strong>in</strong> the production and comprehension<br />
<strong>of</strong> object relative clauses. In Proceed<strong>in</strong>gs <strong>of</strong> the 25th Annual Meet<strong>in</strong>g <strong>of</strong> the Cognitive<br />
Science Society, pages 946–951, 2003.<br />
XIV
Bibliography<br />
F. Reali and M. H. Christiansen. Process<strong>in</strong>g <strong>of</strong> relative clauses is made easier by frequency<br />
<strong>of</strong> occurrence. Journal <strong>of</strong> Memory and Language, 57(1):1–23, 2007a.<br />
F. Reali and M. H. Christiansen. Word chunk frequencies affect the process<strong>in</strong>g <strong>of</strong> pronom<strong>in</strong>al<br />
object-relative clauses. The Quarterly Journal <strong>of</strong> Experimental Psychology, 60(2):<br />
161–170, 2007b.<br />
R. Roberts and E. Gibson. Individual differences <strong>in</strong> sentence memory. Journal <strong>of</strong> Psychol<strong>in</strong>guistic<br />
Research, 31(6):573–598, Nov. 2002.<br />
P. Rodriguez. Simple Recurrent Networks Learn Context-Free and Context-Sensitive<br />
Languages by Count<strong>in</strong>g, volume 13. MIT Press, 2001.<br />
D. L. T. Rohde. A <strong>Connectionist</strong> Model <strong>of</strong> <strong>Sentence</strong> Comprehension and Production.<br />
PhD thesis, Carnegie Mellon University, 2002.<br />
D. L. T. Rohde. The simple language generator: Encod<strong>in</strong>g complex languages with<br />
simple grammars (Tech). Mellon University, Department <strong>of</strong> Computer Science, pages<br />
99–123, 1999.<br />
D. E. Rumelhart and J. L. McClelland. On Learn<strong>in</strong>g the Past Tenses <strong>of</strong> English Verbs.<br />
California University San Diego, La Jolla Center for Research <strong>in</strong> Language, 1985.<br />
H. Schriefers, A. D. Friederici, and K. Kuhn. The process<strong>in</strong>g <strong>of</strong> locally ambiguous relative<br />
clauses <strong>in</strong> german. Journal <strong>of</strong> Memory and Language, 34(4):499–520, 1995.<br />
M. S. Seidenberg and J. L. McClelland. A distributed, developmental model <strong>of</strong> word<br />
recognition and nam<strong>in</strong>g. Psychological Review, 96(4):523–568, 1989.<br />
M. Spivey-Knowlton. Quantitative predictions from a constra<strong>in</strong>t-<strong>based</strong> theory <strong>of</strong> syntactic<br />
ambiguity resolution. In Proceed<strong>in</strong>gs <strong>of</strong> the 1993 <strong>Connectionist</strong> Models Summer<br />
School, pages 130–137. Lawrence Erlbaum Associates, 1994.<br />
K. Stromswold, D. Caplan, N. Alpert, and S. Rauch. Localization <strong>of</strong> syntactic comprehension<br />
by positron emission tomography. Bra<strong>in</strong> and Language, 52(3):452–473,<br />
1996.<br />
W. Tabor, C. Juliano, and M. K. Tanenhaus. Pars<strong>in</strong>g <strong>in</strong> a dynamical system: An<br />
attractor-<strong>based</strong> account <strong>of</strong> the <strong>in</strong>teraction <strong>of</strong> lexical and structural constra<strong>in</strong>ts <strong>in</strong> sentence<br />
process<strong>in</strong>g. Language and Cognitive Processes, 12(2/3):211–271, 1997.<br />
W. Tabor, B. Galantucci, and D. Richardson. <strong>Effects</strong> <strong>of</strong> merely local syntactic coherence<br />
on sentence process<strong>in</strong>g. Journal <strong>of</strong> Memory and Language, 50(4):355–370, May 2004.<br />
W. L. Taylor. Cloze procedure: A new tool for measur<strong>in</strong>g readability. Journalism<br />
Quarterly, 30(4):415–433, 1953.<br />
XV
Bibliography<br />
R. S. Toml<strong>in</strong>. Basic word order: Functional pr<strong>in</strong>ciples. Croom Helm, London, 1986.<br />
M. J. Traxler, R. K. Morris, and R. E. Seely. Process<strong>in</strong>g subject and object relative<br />
clauses: Evidence from eye movements. Journal <strong>of</strong> Memory and Language, 47(1):<br />
69–90, July 2002.<br />
S. Vasishth. Integration and prediction <strong>in</strong> head-f<strong>in</strong>al structures. In Process<strong>in</strong>g and<br />
Produc<strong>in</strong>g Head-F<strong>in</strong>al Structure. 2008.<br />
S. Vasishth and R. L. Lewis. Human language process<strong>in</strong>g: Symbolic models. In K. Brown,<br />
editor, Encyclopedia <strong>of</strong> Language and L<strong>in</strong>guistics, volume 5, pages 410–419. Elsevier,<br />
2006a.<br />
S. Vasishth and R. L. Lewis. Argument-head distance and process<strong>in</strong>g complexity: Expla<strong>in</strong><strong>in</strong>g<br />
both locality and antilocality effects. Language, 82(4):767–794, 2006b.<br />
S. Vasishth, K. Suckow, R. Lewis, and S. Kern. Short-term forgett<strong>in</strong>g <strong>in</strong> sentence comprehension:<br />
Crossl<strong>in</strong>guistic evidence from head-f<strong>in</strong>al structures. Submitted to Language<br />
and Cognitive Processes, 2008.<br />
G. S. Waters and D. Caplan. The capacity theory <strong>of</strong> sentence comprehension: critique<br />
<strong>of</strong> just and carpenter (1992). Psychological Review, 103(4):761–72, 1996.<br />
J. B. Wells, M. H. Christiansen, D. S. Race, D. J. Acheson, and M. C. MacDonald. <strong>Experience</strong><br />
and sentence process<strong>in</strong>g: Statistical learn<strong>in</strong>g and relative clause comprehension.<br />
Cognitive Psychology, 58(2):250–271, 2009.<br />
M. Yoshida, S. Aoshima, and C. Phillips. Relative clause prediction <strong>in</strong> japanese. In<br />
Proceed<strong>in</strong>gs <strong>of</strong> the 17th Annual CUNY Conference on Human <strong>Sentence</strong> Process<strong>in</strong>g,<br />
College Park, Maryland, 2004.<br />
XVI
Appendix A<br />
Statistics<br />
SRC ORC<br />
region mean se mean se<br />
N1/V1 0.1746577 0.0568403 0.2183628 0.2644270<br />
de 0.3303811 0.1897248 0.1207579 0.0726347<br />
N2 0.1001319 0.0391884 0.1064459 0.0342768<br />
Table A.1: Statistics for simulation 1<br />
SRC ORC<br />
region mean se mean se<br />
N1/V1 0.1319658 0.06477893 0.2172967 0.2631532<br />
de 0.0870274 0.07931393 0.1096769 0.0772652<br />
N2 0.1001319 0.03918843 0.1064459 0.0342768<br />
Table A.2: Statistics for simulation 2<br />
drop-V2 no-drop<br />
region mean se mean se<br />
V3 0.7976032 0.1091270 0.7977388 0.1090123<br />
V1 0.8639397 0.0581128 0.9801276 0.0140664<br />
post-V1 0.1610184 0.1042171 0.2658784 0.1700682<br />
Table A.3: Statistics for simulation 3a<br />
XVII
drop-V2 no-drop<br />
region mean se mean se<br />
V3 0.1794197 0.0801674 0.1797425 0.08047316<br />
V1 0.6870279 0.0550648 0.7735128 0.05198529<br />
post-V1 0.1365552 0.1125236 0.1522183 0.0880624<br />
Table A.4: Statistics for simulation 3b<br />
drop-V2 no-drop<br />
region mean se mean se<br />
V3 0.1376888 0.1044064 0.1375375 0.1043905<br />
V1 0.5554193 0.2136721 0.5564368 0.2350018<br />
post-V1 0.3160993 0.122175 0.2946364 0.1392607<br />
Table A.5: Statistics for simulation 4a<br />
drop-V2 no-drop<br />
region mean se mean se<br />
V3 0.1462594 0.1230590 0.1465719 0.1232603<br />
V1 0.8691347 0.1514235 0.9761057 0.02860683<br />
post-V1 0.3004169 0.2266813 0.4357311 0.1730854<br />
Table A.6: Statistics for simulation 4b<br />
XVIII<br />
Appendix A Statistics
Appendix B<br />
Grammars<br />
B.1 English<br />
(written by Lars Konieczny, 2003)<br />
S : NP VP "." |<br />
{num1, NP N, VP Vi} | #number agreement <strong>in</strong> matrix clause<br />
{num2, NP N, VP Vt} ;<br />
NP : det N | det N Rel (0.05) |<br />
{num1, N, Rel SRC VP Vi} | #number agreement <strong>in</strong> Subject-RCs.<br />
{num2, N, Rel SRC VP Vt} ;<br />
Rel : SRC | ORC ;<br />
SRC : that VP ;<br />
ORC : that NP Vt |<br />
{num2, NP N, Vt} ; #number agreement <strong>in</strong> Object-RCs.<br />
VP : Vi | Vt NP ;<br />
N : Ns<strong>in</strong>g | Nplur ;<br />
Vi : VIs<strong>in</strong>g | VIplur ;<br />
Vt : VTs<strong>in</strong>g | VTplur ;<br />
### LEXICON ###<br />
Ns<strong>in</strong>g : lawyer | senator | reporter | banker | judge ;<br />
Nplur : lawyers | senators | reporters | bankers | judges ;<br />
VIs<strong>in</strong>g : lies | lied | hesitates | hesitated | phones |<br />
XIX
Appendix B Grammars<br />
phoned | understands | understood ;<br />
VTs<strong>in</strong>g : praises | praised | attacks | attacked |<br />
phones | phoned | understands | understood ;<br />
VIplur : lie | lied | hesitate | hesitated | phone |<br />
phoned | understand | understood ;<br />
VTplur : praise | praised | attack | attacked |<br />
phone | phoned | understand | understood ;<br />
det : the ;<br />
### RULES ###<br />
num1 { # for <strong>in</strong>transitive verbs<br />
Ns<strong>in</strong>g : VIs<strong>in</strong>g;<br />
Nplur : VIplur;<br />
}<br />
num2 { # for transitive verbs<br />
Ns<strong>in</strong>g : VTs<strong>in</strong>g;<br />
Nplur : VTplur;<br />
}<br />
B.2 German<br />
(written by Daniel Müller and Lars Konieczny 2004)<br />
S : NPnom VP "." |<br />
{numNoun, NPnom Nnom, VP V} | #num normal(N-V)<br />
{numNoun, NPnomx Nnom, V} ; #num topikalisiert<br />
NPnomx : DETnom Nnom RCx |<br />
{numDET, DETnom, Nnom}| #numerusabgleich DET-N<br />
{numREL, Nnom, RCx RCpure RCsub RELnom} | #num N-RelPron nom<br />
{numREL, Nnom, RCx RCpure RCobj RELakk} | #num N-RelPron askk<br />
{numNoun, Nnom, RCx RCpure RCsub V} ; #num N-V_e<strong>in</strong>g. SRC<br />
NPnom : DETnom Nnom RC |<br />
{numDET, DETnom, Nnom}| #numerusabgleich DET-N<br />
{numREL, Nnom, RC RCpure RCsub RELnom} | #num N-RelPron nom<br />
XX
B.2 German<br />
{numREL, Nnom, RC RCpure RCobj RELakk} | #num N-RelPron askk<br />
{numNoun, Nnom, RC RCpure RCsub V} ; #num N-V_e<strong>in</strong>g. SRC<br />
NPakkx : DETakk Nakk RCx |<br />
{numDET, DETakk, Nakk} |<br />
{numREL, Nakk, RCx RCpure RCsub RELnom} |<br />
{numREL, Nakk, RCx RCpure RCobj RELakk} |<br />
{numNoun, Nakk, RCx RCpure RCsub V} ;<br />
NPakk : DETakk Nakk RC |<br />
{numDET, DETakk, Nakk} |<br />
{numREL, Nakk, RC RCpure RCsub RELnom} |<br />
{numREL, Nakk, RC RCpure RCobj RELakk} |<br />
{numNoun, Nakk, RC RCpure RCsub V} ;<br />
VP : V NPakkx;<br />
RCx : "," RCpure | "" (0.9) ;<br />
RC : "," RCpure ","| "" (0.9) ;<br />
RCpure : RCsub | RCobj ;<br />
RCsub : RELnom NPakk V ;<br />
RCobj : RELakk NPnom V |<br />
{numNoun, NPnom Nnom, V} ;<br />
Nnom : Nnom_pl | Nnom_s<strong>in</strong>g ;<br />
Nakk : Nakk_pl | Nakk_s<strong>in</strong>g ;<br />
V : V_pl | V_s<strong>in</strong>g;<br />
DETnom : DETnom_pl | DETnom_s<strong>in</strong>g (0.7) ;<br />
DETakk : DETakk_pl | DETakk_s<strong>in</strong>g (0.7) ;<br />
RELnom : RELnom_pl | RELnom_s<strong>in</strong>g;<br />
RELakk : RELakk_pl | RELakk_s<strong>in</strong>g;<br />
Nnom_pl : Jungen | Polizisten | Passanten | Menschen;<br />
Nakk_pl : Jungen | Polizisten | Passanten | Menschen;<br />
Nnom_s<strong>in</strong>g : Junge | Polizist | Passant | Mensch;<br />
Nakk_s<strong>in</strong>g : Jungen | Polizisten | Passanten | Menschen;<br />
XXI
V_pl : beschimpfen | treffen | rufen | verspotten;<br />
V_s<strong>in</strong>g : beschimpft | trifft | ruft | verspottet;<br />
DETnom_pl : die;<br />
DETnom_s<strong>in</strong>g : der;<br />
DETakk_pl : die;<br />
DETakk_s<strong>in</strong>g : den;<br />
RELnom_pl : die;<br />
RELnom_s<strong>in</strong>g :der;<br />
RELakk_pl : die;<br />
RELakk_s<strong>in</strong>g : den;<br />
numNoun {<br />
Nnom_s<strong>in</strong>g : V_s<strong>in</strong>g;<br />
Nnom_pl : V_pl;<br />
Nakk_s<strong>in</strong>g : V_s<strong>in</strong>g;<br />
Nakk_pl : V_pl;<br />
}<br />
numDET {<br />
DETnom_pl : Nnom_pl;<br />
DETakk_pl : Nakk_pl;<br />
DETnom_s<strong>in</strong>g : Nnom_s<strong>in</strong>g;<br />
DETakk_s<strong>in</strong>g : Nakk_s<strong>in</strong>g;<br />
}<br />
numREL {<br />
Nnom_pl : RELnom_pl | RELakk_pl;<br />
Nakk_pl : RELnom_pl | RELakk_pl;<br />
Nnom_s<strong>in</strong>g : RELnom_s<strong>in</strong>g | RELakk_s<strong>in</strong>g;<br />
Nakk_s<strong>in</strong>g : RELnom_s<strong>in</strong>g | RELakk_s<strong>in</strong>g;<br />
}<br />
B.3 Mandar<strong>in</strong><br />
S : NP VP END ;<br />
VP : Vt NP | Vi ;<br />
NP : N | Rel N (0.5) ;<br />
XXII<br />
Appendix B Grammars
B.3 Mandar<strong>in</strong><br />
Rel : SRC (0.85) | ORC ;<br />
SRC : VP GEN ;<br />
ORC : NP Vt GEN ;<br />
N : Ns<strong>in</strong>g | Nplur ;<br />
### LEXICON ###<br />
Ns<strong>in</strong>g : lushi | guanyuan | fayanren |<br />
y<strong>in</strong>hangjia | faguan ;<br />
Nplur : lushimen | guanyuanmen | fayanren |<br />
y<strong>in</strong>hangjiamen | faguanmen ;<br />
Vi : sahuang | youyu | dadianhua | lijie ;<br />
Vt : biaoyang | gongji | lijie ;<br />
GEN : de ;<br />
END : "." ;<br />
biaoyang praise<br />
dadianhua phone<br />
de gen<br />
faguan judge<br />
faguanmen judges<br />
fayanren reporter/reporters<br />
gongji attack<br />
guanyuan senator<br />
guanyuanmen senators<br />
lijie understand<br />
lushi lawyer<br />
lushimen lawyers<br />
sahuang lie<br />
y<strong>in</strong>hangjia banker<br />
y<strong>in</strong>hangjiamen bankers<br />
youyu hesitate<br />
Table B.1: Mandar<strong>in</strong> lexicon<br />
XXIII
Erklärung der Urheberschaft<br />
Ich erkläre hiermit an Eides statt, dass ich die vorliegende Arbeit ohne Hilfe Dritter<br />
und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus<br />
fremden Quellen direkt oder <strong>in</strong>direkt übernommenen Gedanken s<strong>in</strong>d als solche kenntlich<br />
gemacht. Die Arbeit wurde bisher <strong>in</strong> gleicher oder ähnlicher Form <strong>in</strong> ke<strong>in</strong>er anderen<br />
Prüfungsbehörde vorgelegt und auch noch nicht veröffentlicht.<br />
Ort, Datum Unterschrift<br />
XXIV