29.08.2013 Views

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3 <strong>Connectionist</strong> Modell<strong>in</strong>g <strong>of</strong> Language Comprehension<br />

their experiment <strong>in</strong> detail. MC02 used a standard simple recurrent network (SRN) with<br />

a hidden and context layer <strong>of</strong> 60 units each. In- and output layers <strong>of</strong> 31 units each represented<br />

30 words plus an end-<strong>of</strong>-sentence (EOS) symbol. The corpora each consisted<br />

<strong>of</strong> 10,000 English sentences constructed randomly from a simple artificial probabilistic<br />

context-free grammar (PCFG). Subject- or object-modify<strong>in</strong>g relative clauses were conta<strong>in</strong>ed<br />

<strong>in</strong> 5% <strong>of</strong> the sentences. Half were subject extracted and half were object extracted<br />

RCs. The rest <strong>of</strong> each corpus consisted <strong>of</strong> simple mono-clausal sentences. Verbs differed<br />

by transitivity and shared a number agreement with their subject nouns. Each corpus<br />

consisted <strong>of</strong> about 55,000 words. The sentence length was 3 to 27 words with a mean <strong>of</strong><br />

4.5. Notably relative clauses could be embedded recursively <strong>in</strong> each noun phrase. The<br />

RC attachment probability <strong>in</strong> the PCFG (0.05) limited the embedd<strong>in</strong>g depth. MC02<br />

tra<strong>in</strong>ed 10 networks with randomly distributed <strong>in</strong>itial weights 1 , each on a different corpus.<br />

The learn<strong>in</strong>g rate was set to 0.1. The tra<strong>in</strong><strong>in</strong>g phase covered only three epochs,<br />

each consist<strong>in</strong>g <strong>of</strong> one corpus length. The networks learned to predict the next word<br />

<strong>in</strong> a sentence without be<strong>in</strong>g provided with any probabilistic <strong>in</strong>formation. The output<br />

unit activations were calculated by a cross-entropy algorithm which ensured that all<br />

activation values summed to one. In that way the networks’ output was comparable to<br />

cont<strong>in</strong>uation likelihoods assigned to each possible word. After tra<strong>in</strong><strong>in</strong>g the networks were<br />

assessed on 10 sentences <strong>of</strong> all three types (SRC, ORC, and simple clause), respectively.<br />

For <strong>in</strong>terpret<strong>in</strong>g the network output <strong>in</strong> terms <strong>of</strong> process<strong>in</strong>g difficulty MC02 calculated<br />

the so-called grammatical prediction error 2 (GPE). The GPE value is a measure<br />

for the network’s difficulty <strong>in</strong> mak<strong>in</strong>g the correct predictions on each word. The measure<br />

was then used to map the relative word-by-word differences between the conditions on<br />

read<strong>in</strong>g times from the study by K<strong>in</strong>g and Just (1991). Besides RC type MC02 used<br />

tra<strong>in</strong><strong>in</strong>g epochs as a second factor. The network performances after one, two, and three<br />

epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g were compared to low-, mid-, and high-span readers’ read<strong>in</strong>g speed.<br />

The results <strong>of</strong> MC02’s network simulation are shown <strong>in</strong> figure 3.3. Pooled over all<br />

three epochs the results show a clear subject preference on the ma<strong>in</strong> verb (praised) and<br />

the preced<strong>in</strong>g region (embedded object <strong>in</strong> the SRC and embedded verb <strong>in</strong> the ORC).<br />

Furthermore the ORC performance shows significant improvement on the embedded and<br />

ma<strong>in</strong> verb through the three epochs <strong>of</strong> tra<strong>in</strong><strong>in</strong>g. Notably, the SRC data does not show<br />

such an improvement. Rather the performance was relatively good from the start with<br />

no change dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. This <strong>in</strong>dicates a clause type × exposure <strong>in</strong>teraction. The<br />

same <strong>in</strong>teraction (<strong>in</strong> this case clause type × read<strong>in</strong>g span) is seen <strong>in</strong> K<strong>in</strong>g and Just’s<br />

empirical data (figure 2.1). Notably, the simple SRN model seems to make better predictions<br />

than the CC-READER model by Just and Carpenter (1992) s<strong>in</strong>ce CC-READER<br />

captures the span effect but not the <strong>in</strong>teraction with clause type (see figure 2.4). Importantly,<br />

MC02 call the mentioned <strong>in</strong>teraction a F requency × Regularity <strong>in</strong>teraction.<br />

Specifically, the regular nature <strong>of</strong> English SRCs with respect to word order (SVO) serves<br />

1 Between -0.15 and 0.15.<br />

2 See chapter 4 for a detailed description<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!