Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
c<strong>on</strong>structi<strong>on</strong>.<br />
(20) With the morphology in or4.pl and the grammar gh4.pl, we can parse:<br />
showParse([’Titus’,laughs]). showParse([’Titus’,will,laugh]).<br />
showParse([’Titus’,eats,a,pie]). showParse([is,’Titus’,laughing]).<br />
showParse([does,’Titus’,laugh]). showParse([what,does,’Titus’,eat]).<br />
(21) Obviously, more complex morphologies (and ph<strong>on</strong>ologies) can be represented by FSMs (Ellis<strong>on</strong>, 1994;<br />
Eisner, 1997), but they will all have domains and ranges that are regular languages.<br />
17.3 Better models <strong>of</strong> the interface<br />
The previous secti<strong>on</strong> shows how to translate from input text to written forms <strong>of</strong> the morphemes, whose syntactic<br />
features are then looked up. We will not develop this idea here, but it is clear that it makes more sense<br />
to translate from the input text directly to the syntactic features. In other words,<br />
represent the lexic<strong>on</strong> as a finite state machine: input → feature sequences<br />
This would allow us to remove some <strong>of</strong> the redundancy. In particular, whenever two feature sequences have a<br />
comm<strong>on</strong> suffix, that suffix could be shared. However, this model has some other, more serious shortcomings.<br />
17.3.1 Reduplicati<strong>on</strong><br />
In some languages, plurality or other meanings are sometimes expressed not by any particular ph<strong>on</strong>etic string,<br />
but by reduplicati<strong>on</strong>, as menti<strong>on</strong>ed earlier <strong>on</strong> pages 24, 182 above. It is easy to show that the language accepted<br />
by any finite transducer is <strong>on</strong>ly a regular language, and hence <strong>on</strong>e that cannot recognize the crossing relati<strong>on</strong>s<br />
apparently found in reduplicati<strong>on</strong>.<br />
17.3.2 Morphology without morphemes<br />
Reduplicati<strong>on</strong> is <strong>on</strong>ly <strong>on</strong>e <strong>of</strong> various kinds <strong>of</strong> morphemic alterati<strong>on</strong>s which do not involve simple affixati<strong>on</strong><br />
<strong>of</strong> material with specific ph<strong>on</strong>etic c<strong>on</strong>tent. Morphemic c<strong>on</strong>tent can be expressed by word internal changes in<br />
vowel quality, for example, or by prosodic cues. The idea that utterances are sequences <strong>of</strong> ph<strong>on</strong>etically given<br />
morphemes is not tenable (Anders<strong>on</strong>, 1992, for example). Rather, a range <strong>of</strong> morphological processes are<br />
available, and the languages <strong>of</strong> the world make different selecti<strong>on</strong>s from them. That means that having just<br />
left and right adjuncti<strong>on</strong> as opti<strong>on</strong>s in head movement is probably inadequate: we should allow various kinds<br />
<strong>of</strong> expressi<strong>on</strong>s <strong>of</strong> the sequences <strong>of</strong> elements that we analyze in syntax.<br />
17.3.3 Probabilistic models, and recognizing new words<br />
When we hear new words, we <strong>of</strong>ten make assumpti<strong>on</strong>s about how they would combine with affixes without<br />
hesitati<strong>on</strong>. This suggests that some kind <strong>of</strong> similarity metric is at work. The relevant metric is by no means<br />
clear yet, but a wide range <strong>of</strong> proposals are subsumed by imagining that there is some “edit distance” that<br />
language learners use in identifying related lexical items. The basic idea is this: given some ways <strong>of</strong> changing<br />
a string (e.g. by adding material to either end <strong>of</strong> the string, by changing some <strong>of</strong> the elements <strong>of</strong> the string, by<br />
copying all or part <strong>of</strong> the string, etc.), a relati<strong>on</strong> between pairs <strong>of</strong> strings is given by the number <strong>of</strong> operati<strong>on</strong>s<br />
required to map <strong>on</strong>e to the other. If these operati<strong>on</strong>s are weighted, then more and less likely relati<strong>on</strong>s can<br />
be specified, and this metric can be adjusted based <strong>on</strong> what has already been learned (Ristad and Yianilos,<br />
1996). This approach is subsumed by the more general perspective in which the similarity <strong>of</strong> two sequences<br />
is assessed by the length <strong>of</strong> the shortest program that can produce <strong>on</strong>e from the other (Chater and Vitányi,<br />
2002).<br />
265