Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />
their experiment <strong>in</strong> detail. MC02 used a standard simple recurrent network (SRN) with<br />
a hidden and context layer <strong>of</strong> 60 units each. In- and output layers <strong>of</strong> 31 units each represented<br />
30 words plus an end-<strong>of</strong>-sentence (EOS) symbol. The corpora each consisted<br />
<strong>of</strong> 10,000 English sentences constructed randomly from a simple artificial probabilistic<br />
context-free grammar (PCFG). Subject- or object-modify<strong>in</strong>g relative clauses were conta<strong>in</strong>ed<br />
<strong>in</strong> 5% <strong>of</strong> the sentences. Half were subject extracted and half were object extracted<br />
RCs. The rest <strong>of</strong> each corpus consisted <strong>of</strong> simple mono-clausal sentences. Verbs differed<br />
by transitivity and shared a number agreement with their subject nouns. Each corpus<br />
consisted <strong>of</strong> about 55,000 words. The sentence length was 3 to 27 words with a mean <strong>of</strong><br />
4.5. Notably relative clauses could be embedded recursively <strong>in</strong> each noun phrase. The<br />
RC attachment probability <strong>in</strong> the PCFG (0.05) limited the embedd<strong>in</strong>g depth. MC02<br />
tra<strong>in</strong>ed 10 networks with randomly distributed <strong>in</strong>itial weights 1 , each on a different corpus.<br />
The learn<strong>in</strong>g rate was set to 0.1. The tra<strong>in</strong><strong>in</strong>g phase covered only three epochs,<br />
each consist<strong>in</strong>g <strong>of</strong> one corpus length. The networks learned to predict the next word<br />
<strong>in</strong> a sentence without be<strong>in</strong>g provided with any probabilistic <strong>in</strong>formation. The output<br />
unit activations were calculated by a cross-entropy algorithm which ensured that all<br />
activation values summed to one. In that way the networks’ output was comparable to<br />
cont<strong>in</strong>uation likelihoods assigned to each possible word. After tra<strong>in</strong><strong>in</strong>g the networks were<br />
assessed on 10 sentences <strong>of</strong> all three types (SRC, ORC, and simple clause), respectively.<br />
For <strong>in</strong>terpret<strong>in</strong>g the network output <strong>in</strong> terms <strong>of</strong> process<strong>in</strong>g difficulty MC02 calculated<br />
the so-called grammatical prediction error 2 (GPE). The GPE value is a measure<br />
for the network’s difficulty <strong>in</strong> mak<strong>in</strong>g the correct predictions on each word. The measure<br />
was then used to map the relative word-by-word differences between the conditions on<br />
read<strong>in</strong>g times from the study by K<strong>in</strong>g and Just (1991). Besides RC type MC02 used<br />
tra<strong>in</strong><strong>in</strong>g epochs as a second factor. The network performances after one, two, and three<br />
epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g were compared to low-, mid-, and high-span readers’ read<strong>in</strong>g speed.<br />
The results <strong>of</strong> MC02’s network simulation are shown <strong>in</strong> figure 3.3. Pooled over all<br />
three epochs the results show a clear subject preference on the ma<strong>in</strong> verb (praised) and<br />
the preced<strong>in</strong>g region (embedded object <strong>in</strong> the SRC and embedded verb <strong>in</strong> the ORC).<br />
Furthermore the ORC performance shows significant improvement on the embedded and<br />
ma<strong>in</strong> verb through the three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g. Notably, the SRC data does not show<br />
such an improvement. Rather the performance was relatively good from the start with<br />
no change dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. This <strong>in</strong>dicates a clause type × exposure <strong>in</strong>teraction. The<br />
same <strong>in</strong>teraction (<strong>in</strong> this case clause type × read<strong>in</strong>g span) is seen <strong>in</strong> K<strong>in</strong>g and Just’s<br />
empirical data (figure 2.1). Notably, the simple SRN model seems to make better predictions<br />
than the CC-READER model by Just and Carpenter (1992) s<strong>in</strong>ce CC-READER<br />
captures the span effect but not the <strong>in</strong>teraction with clause type (see figure 2.4). Importantly,<br />
MC02 call the mentioned <strong>in</strong>teraction a F requency × Regularity <strong>in</strong>teraction.<br />
Specifically, the regular nature <strong>of</strong> English SRCs with respect to word order (SVO) serves<br />
1 Between -0.15 and 0.15.<br />
2 See chapter 4 for a detailed description<br />
52