Connectionist Modeling of Experience-based Effects in Sentence ...

More documents

Recommendations

Info

1.3 Psycholinguistic Aspects et al. (1994) and Juliano and Tanenhaus (1994) are counted into the third category. Also the account of Bever (1970) is a mixed-grain account. Mitchell et al. basically argue for a coarse grained approach. They present empirical evidence contra sub-classification of structures depending upon noun types or animacy or the like. For example in French the statistics of all NP-PP-RC structures make the correct noun phrase attachment preference predictions (high preference). Including statistical information about definiteness and other aspects of noun phrases leads to the wrong predictions. Obviously, fine-grained information has to be ignored in that case. In a sentence completion study Corley and Corley (1995) showed evidence that noun phrase attachment preferences in English do not rely on lexical data. They analyzed the by-subject variance of two studies involving the same structures but different lexical items. The lexical alternation did not affect the attachment preference (low). Interestingly, for noun phrase attachments with two potential attachment sites there is a high-attachment preference in most languages. Exceptions are English, German, Italian, and Swedish where low attachment is generally preferred. Mitchell et al. (1995) argue that such general preferences are only explainable in coarse-grained models. Cuetos et al. (1996) report that corpus frequencies predict the attachment preferences of two-NP-site ambiguities 1 for Spanish and English. For Spanish, which shows a high-attachment preference, Cuetos and colleagues found that 60% of the NP-PP-RC constructions in the corpus had the RC attached to the NP. For English (low-attachment preference) they only found 38% high attachment in the corpus. Desmet and Gibson (2003) argue against the commitment of a model to one grain size level by showing that in some cases NP attachment preferences are affected on the lexical level. Desmet and Gibson studied human preferences in eyetracking and corpus frequencies of three-NP-site noun conjunction ambiguities (see example 6). They found that corpus frequencies support the empirical results, which show a preference of middle over high attachment. But replacing the noun inside the attached phrase with the pronoun one turned the preference into high > middle rather than middle > high (Gibson and Schütze, 1999). (6) A column about a soccer team from the suburbs and. . . a. an article/one about a baseball team from the city were published in the Sunday edition. (high) b. a baseball team/one from the city was published in the Sunday edition. (middle) A similar effect was gained for German two-NP-site ambiguities (Hemforth et al., 2000): The ambiguity containing an anaphoric binding (e.g. a relative clause) produces a high attachment preference, while without the anaphoric binding it results in a low attachment preference. This leads Desmet and Gibson (2003) to conclude that in addition to structural information the occurrences of pronouns have to be tabulated as a 1 Two-NP-site ambiguities refer to constructions where there are two preceding NPs to potentially attach to. 15
Chapter 1 Preliminaries predictor. This shows – as Mitchell et al. (1995) also admit – that the exposure-based approach has to find a balance between coarse and more fine grained measures and that different structures might require different grain sizes for the tabulation of frequencies. Structural Frequency in a Connectionist Network A type of models that specifically base their predictions on the records of structural frequencies are connectionist network models. Mitchell et al. (1995) say that “in a connectionist system the records would consist of a set of activation weights distributed throughout the network.” (p. 472). This is partly true. The network does not explicitly count frequencies, nor are frequencies stored anywhere in the network. It is rather that every exposure of the network to a specific structure immediately changes the weight distributions and thus the whole behavior of the network. So, one could say that the weight distributions contain implicit structural knowledge. This can be observed, for example, in the activations in hidden layers. Inputs of similar structures result in similar activation patterns in the hidden layer of simple recurrent networks (SRN, Elman, 1990). The comparison of these patterns reveals the structural generalization levels that drive the networks predictions. Different from symbolic exposure-based accounts there is no explicitly fixed structural grain size the network is sensitive to. There is of course a limit to the lowest grain size, which is defined by the encoding level of the input. If the input string is encoded on word level, the network has no information below that level to work with. The upper limit depends on the networks architecture and can be affected by the size of the hidden layer, the learning mechanism, and specifically for recurrent networks by their “memory span”. What levels the network actually choses is hard to say in advance. Learning is a walk through a state space trying to find the optimal solution to the desired input-output pairing. The choice of grain size is part of that optimization process and can change during the learning phase. A commitment to a specific grain size implicitly involves a commitment to the number of structures to distinguish. A fine-grained model will consequently have to keep apart lots of structural representations, while a very coarse-grained model has only few structures to deal with. This relation means that a network with very few hidden nodes where the information has to be passed through will only be able to do very high level generalizations. The final choice of grain size will ideally be the most useful structural level of the internal input representation to meet the output requirements given the networks architecture. In chapter 3 the properties of connectionist networks will be discussed in more detail. Frequency and Regularity Structural regularity is the occurrence of similarities between different structures on a certain grain level. For example, the English SRC is more regular than the ORC because on the level of functional categories (SVO) the SRC is similar to many other structures. In contrast, the corpus frequency of OSV as in the ORC is very low. In that sense, 16
Page 1 and 2: Connectionist Modeling of Experienc
Page 3 and 4: Acknowledgments I am grateful to Be
Page 5 and 6: Contents 3.3.4 Summary . . . . . .
Page 7 and 8: List of Tables 2.1 Languages with s
Page 9 and 10: Chapter 1 Preliminaries and the ACT
Page 11 and 12: Chapter 1 Preliminaries (1) a. The
Page 13 and 14: 1.3 Psycholinguistic Aspects Chapte
Page 15 and 16: Chapter 1 Preliminaries referrentia
Page 17 and 18: Chapter 1 Preliminaries empirically
Page 19 and 20: Chapter 1 Preliminaries rial, in Le
Page 21: Chapter 1 Preliminaries are limited
Page 25 and 26: Chapter 2 Issues in Relative Clause
Page 45 and 46: clear predictions for head-final RC
Page 51 and 52: Reading time [ms] 400 600 800 1000
Page 55 and 56: OUTPUT PUT PLAN Chapter 3 Connectio
Page 57 and 58: Chapter 3 Connectionist Modelling o
Page 63 and 64: German Word Order Chapter 3 Connect
Page 65 and 66: Length-Adjusted Reading Time (ms) C
Page 69 and 70: Chapter 4 Two SRN Prediction Studie
Page 71 and 72: 4.1.3 Training and Testing Chapter
Page 73 and 74:
GPE 0.0 0.2 0.4 0.6 0.8 1.0 English
Page 75 and 76:
(23) [V1 [N1 V2 de ORC ] N2 de SRC
Page 77 and 78:
Chapter 4 Two SRN Prediction Studie
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
(27) German with commas: a. SRC: S1
Page 87 and 88:
4.4.4 Discussion Chapter 4 Two SRN
Page 89 and 90:
Page 91 and 92:
Bibliography M. H. Christiansen. Th
Page 93 and 94:
Bibliography E. Gibson and J. Thoma
Page 95 and 96:
Bibliography J. W. King and M. Kuta
Page 97 and 98:
Bibliography F. Reali and M. H. Chr
Page 99 and 100:
Appendix A Statistics SRC ORC regio
Page 101 and 102:
Appendix B Grammars B.1 English (wr
Page 103 and 104:
B.2 German {numREL, Nnom, RC RCpure
Page 105 and 106:
B.3 Mandarin Rel : SRC (0.85) | ORC
show all

Connectionist Modeling of Experience-based Effects in Sentence ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?