Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
Connectionist Modeling of Experience-based Effects in Sentence ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 1 Prelim<strong>in</strong>aries<br />
predictor. This shows – as Mitchell et al. (1995) also admit – that the exposure-<strong>based</strong><br />
approach has to f<strong>in</strong>d a balance between coarse and more f<strong>in</strong>e gra<strong>in</strong>ed measures and that<br />
different structures might require different gra<strong>in</strong> sizes for the tabulation <strong>of</strong> frequencies.<br />
Structural Frequency <strong>in</strong> a <strong>Connectionist</strong> Network<br />
A type <strong>of</strong> models that specifically base their predictions on the records <strong>of</strong> structural<br />
frequencies are connectionist network models. Mitchell et al. (1995) say that “<strong>in</strong> a<br />
connectionist system the records would consist <strong>of</strong> a set <strong>of</strong> activation weights distributed<br />
throughout the network.” (p. 472). This is partly true. The network does not explicitly<br />
count frequencies, nor are frequencies stored anywhere <strong>in</strong> the network. It is rather that<br />
every exposure <strong>of</strong> the network to a specific structure immediately changes the weight<br />
distributions and thus the whole behavior <strong>of</strong> the network. So, one could say that the<br />
weight distributions conta<strong>in</strong> implicit structural knowledge. This can be observed, for<br />
example, <strong>in</strong> the activations <strong>in</strong> hidden layers. Inputs <strong>of</strong> similar structures result <strong>in</strong> similar<br />
activation patterns <strong>in</strong> the hidden layer <strong>of</strong> simple recurrent networks (SRN, Elman, 1990).<br />
The comparison <strong>of</strong> these patterns reveals the structural generalization levels that drive<br />
the networks predictions. Different from symbolic exposure-<strong>based</strong> accounts there is no<br />
explicitly fixed structural gra<strong>in</strong> size the network is sensitive to. There is <strong>of</strong> course a<br />
limit to the lowest gra<strong>in</strong> size, which is def<strong>in</strong>ed by the encod<strong>in</strong>g level <strong>of</strong> the <strong>in</strong>put. If<br />
the <strong>in</strong>put str<strong>in</strong>g is encoded on word level, the network has no <strong>in</strong>formation below that<br />
level to work with. The upper limit depends on the networks architecture and can be<br />
affected by the size <strong>of</strong> the hidden layer, the learn<strong>in</strong>g mechanism, and specifically for<br />
recurrent networks by their “memory span”. What levels the network actually choses<br />
is hard to say <strong>in</strong> advance. Learn<strong>in</strong>g is a walk through a state space try<strong>in</strong>g to f<strong>in</strong>d the<br />
optimal solution to the desired <strong>in</strong>put-output pair<strong>in</strong>g. The choice <strong>of</strong> gra<strong>in</strong> size is part<br />
<strong>of</strong> that optimization process and can change dur<strong>in</strong>g the learn<strong>in</strong>g phase. A commitment<br />
to a specific gra<strong>in</strong> size implicitly <strong>in</strong>volves a commitment to the number <strong>of</strong> structures to<br />
dist<strong>in</strong>guish. A f<strong>in</strong>e-gra<strong>in</strong>ed model will consequently have to keep apart lots <strong>of</strong> structural<br />
representations, while a very coarse-gra<strong>in</strong>ed model has only few structures to deal with.<br />
This relation means that a network with very few hidden nodes where the <strong>in</strong>formation<br />
has to be passed through will only be able to do very high level generalizations. The<br />
f<strong>in</strong>al choice <strong>of</strong> gra<strong>in</strong> size will ideally be the most useful structural level <strong>of</strong> the <strong>in</strong>ternal<br />
<strong>in</strong>put representation to meet the output requirements given the networks architecture.<br />
In chapter 3 the properties <strong>of</strong> connectionist networks will be discussed <strong>in</strong> more detail.<br />
Frequency and Regularity<br />
Structural regularity is the occurrence <strong>of</strong> similarities between different structures on a<br />
certa<strong>in</strong> gra<strong>in</strong> level. For example, the English SRC is more regular than the ORC because<br />
on the level <strong>of</strong> functional categories (SVO) the SRC is similar to many other structures.<br />
In contrast, the corpus frequency <strong>of</strong> OSV as <strong>in</strong> the ORC is very low. In that sense,<br />
16