29.08.2013 Views

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

Connectionist Modeling of Experience-based Effects in Sentence ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 1 Prelim<strong>in</strong>aries<br />

predictor. This shows – as Mitchell et al. (1995) also admit – that the exposure-<strong>based</strong><br />

approach has to f<strong>in</strong>d a balance between coarse and more f<strong>in</strong>e gra<strong>in</strong>ed measures and that<br />

different structures might require different gra<strong>in</strong> sizes for the tabulation <strong>of</strong> frequencies.<br />

Structural Frequency <strong>in</strong> a <strong>Connectionist</strong> Network<br />

A type <strong>of</strong> models that specifically base their predictions on the records <strong>of</strong> structural<br />

frequencies are connectionist network models. Mitchell et al. (1995) say that “<strong>in</strong> a<br />

connectionist system the records would consist <strong>of</strong> a set <strong>of</strong> activation weights distributed<br />

throughout the network.” (p. 472). This is partly true. The network does not explicitly<br />

count frequencies, nor are frequencies stored anywhere <strong>in</strong> the network. It is rather that<br />

every exposure <strong>of</strong> the network to a specific structure immediately changes the weight<br />

distributions and thus the whole behavior <strong>of</strong> the network. So, one could say that the<br />

weight distributions conta<strong>in</strong> implicit structural knowledge. This can be observed, for<br />

example, <strong>in</strong> the activations <strong>in</strong> hidden layers. Inputs <strong>of</strong> similar structures result <strong>in</strong> similar<br />

activation patterns <strong>in</strong> the hidden layer <strong>of</strong> simple recurrent networks (SRN, Elman, 1990).<br />

The comparison <strong>of</strong> these patterns reveals the structural generalization levels that drive<br />

the networks predictions. Different from symbolic exposure-<strong>based</strong> accounts there is no<br />

explicitly fixed structural gra<strong>in</strong> size the network is sensitive to. There is <strong>of</strong> course a<br />

limit to the lowest gra<strong>in</strong> size, which is def<strong>in</strong>ed by the encod<strong>in</strong>g level <strong>of</strong> the <strong>in</strong>put. If<br />

the <strong>in</strong>put str<strong>in</strong>g is encoded on word level, the network has no <strong>in</strong>formation below that<br />

level to work with. The upper limit depends on the networks architecture and can be<br />

affected by the size <strong>of</strong> the hidden layer, the learn<strong>in</strong>g mechanism, and specifically for<br />

recurrent networks by their “memory span”. What levels the network actually choses<br />

is hard to say <strong>in</strong> advance. Learn<strong>in</strong>g is a walk through a state space try<strong>in</strong>g to f<strong>in</strong>d the<br />

optimal solution to the desired <strong>in</strong>put-output pair<strong>in</strong>g. The choice <strong>of</strong> gra<strong>in</strong> size is part<br />

<strong>of</strong> that optimization process and can change dur<strong>in</strong>g the learn<strong>in</strong>g phase. A commitment<br />

to a specific gra<strong>in</strong> size implicitly <strong>in</strong>volves a commitment to the number <strong>of</strong> structures to<br />

dist<strong>in</strong>guish. A f<strong>in</strong>e-gra<strong>in</strong>ed model will consequently have to keep apart lots <strong>of</strong> structural<br />

representations, while a very coarse-gra<strong>in</strong>ed model has only few structures to deal with.<br />

This relation means that a network with very few hidden nodes where the <strong>in</strong>formation<br />

has to be passed through will only be able to do very high level generalizations. The<br />

f<strong>in</strong>al choice <strong>of</strong> gra<strong>in</strong> size will ideally be the most useful structural level <strong>of</strong> the <strong>in</strong>ternal<br />

<strong>in</strong>put representation to meet the output requirements given the networks architecture.<br />

In chapter 3 the properties <strong>of</strong> connectionist networks will be discussed <strong>in</strong> more detail.<br />

Frequency and Regularity<br />

Structural regularity is the occurrence <strong>of</strong> similarities between different structures on a<br />

certa<strong>in</strong> gra<strong>in</strong> level. For example, the English SRC is more regular than the ORC because<br />

on the level <strong>of</strong> functional categories (SVO) the SRC is similar to many other structures.<br />

In contrast, the corpus frequency <strong>of</strong> OSV as <strong>in</strong> the ORC is very low. In that sense,<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!