Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
8.1.12 Markov models in human syntactic analysis?<br />
(92) Shann<strong>on</strong> (1948, pp42-43) says:<br />
We can also approximate to a natural language by means <strong>of</strong> a series <strong>of</strong> simple artificial language…To<br />
give a visual idea <strong>of</strong> how this series approaches a language, typical sequences in the<br />
approximati<strong>on</strong>s to English have been c<strong>on</strong>structed and are given below…<br />
5. First order word approximati<strong>on</strong>…Here words are chosen independently but with their appropriate<br />
frequencies.<br />
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL<br />
HERE HE THE IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE<br />
MESSAGE HAD BE THESE<br />
6. Sec<strong>on</strong>d order word approximati<strong>on</strong>.<br />
further structure is included.<br />
The word transiti<strong>on</strong> probabilities are correct but no<br />
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARAC-<br />
TER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE<br />
TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED<br />
The resemblance to ordinary English text increases quite noticeably at each <strong>of</strong> the above<br />
steps…It appears then that a sufficiently complex stochastic source process will give a satisfactory<br />
representati<strong>on</strong> <strong>of</strong> a discrete source.<br />
(93) Damerau (1971) c<strong>on</strong>firms this trend in an experiment that involved generating 5th order approximati<strong>on</strong>s.<br />
All these results are hard to interpret though, since (i) sparse data in generati<strong>on</strong> will tend to yield near<br />
copies <strong>of</strong> porti<strong>on</strong>s <strong>of</strong> the source texts (<strong>on</strong> the sparse data problem, remember the results from Jelinek<br />
menti<strong>on</strong>ed in 95, above), and (ii) human linguistic capabilities are not well reflected in typical texts.<br />
(94) Miller and Chomsky objecti<strong>on</strong> 1: The number <strong>of</strong> parameters to set is enormous.<br />
Notice that for a vocabulary <strong>of</strong> 100, 000 words, where each different word is emitted by a different event,<br />
we would need at least 100,000 states. The full transiti<strong>on</strong> matrix then has 100, 0002 = 10 10 entries.<br />
Notice that the last column <strong>of</strong> the transiti<strong>on</strong> matrix is redundant, and so a 109 matrix will do.<br />
Miller and Chomsky (1963, p430) say:<br />
We cannot seriously propose that a child learns the value <strong>of</strong> 10 9 parameters in a childhood<br />
lasting <strong>on</strong>ly 10 8 sec<strong>on</strong>ds.<br />
Why not? This is very far from obvious, unless the parameters are independent, and there is no reas<strong>on</strong><br />
to assume they are.<br />
(95) Miller and Chomsky (1963, p430) objecti<strong>on</strong> 2: The amount <strong>of</strong> input required to set the parameters <strong>of</strong><br />
a reas<strong>on</strong>able model is enormous.<br />
Jelinek (1985) reports that after collecting the trigrams from a 1,500,000 word corpus, he found that,<br />
in the next 300,000 words, 25% <strong>of</strong> the trigrams were new.<br />
No surprise! Some generalizati<strong>on</strong> across lexical combinati<strong>on</strong>s is required. In this c<strong>on</strong>text, the “generalizati<strong>on</strong>”<br />
is sometimes achieved with various “smoothing” functi<strong>on</strong>s, which will be discussed later. With<br />
generalizati<strong>on</strong>, setting large numbers <strong>of</strong> parameters becomes quite c<strong>on</strong>ceivable.<br />
Without a better understanding <strong>of</strong> the issues, I find objecti<strong>on</strong> 2 completely unpersuasive.<br />
(96) Miller and Chomsky (1963, p425) objecti<strong>on</strong> 3:<br />
Since human messages have dependencies extending over l<strong>on</strong>g strings <strong>of</strong> symbols, we know that<br />
any pure Markov source must be too simple…<br />
147