Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
22. Shift–Reduce–Parsing 90<br />
< number > → < digit >|< number >< number ><br />
This grammar has the NTS–property. Any digit is of category < digit >, and a<br />
number can be broken down in<strong>to</strong> two numbers at any point. The standard definition<br />
is as follows.<br />
(220)<br />
< digit > → 0 | 1 | · · · | 9<br />
< number > → < digit >|< digit >< number ><br />
This grammar is not an NTS–grammar, because in the expression<br />
(221) < digt >< digit >< digit ><br />
the occurrence 〈< digit >, ε〉 of the string < digit >< digit > cannot be a<br />
constituent occurrence under any parse. Finally, the language of strings has no<br />
transparent grammar! This is because the string 732 possesses occurrences of the<br />
strings 73 and 32. These occurrences must be occurrences under one and the same<br />
parse if the grammar is transparent. But they overlap. Contradiction.<br />
Now, the strategy above is deterministic. It is easy <strong>to</strong> see that there is a nondeterministic<br />
algorithm implementing this idea that actually defines a machine<br />
parsing exactly the CFLs. It is a machine where you only have a stack, and symbols<br />
scanned are moved on<strong>to</strong> the stack. The <strong>to</strong>p k symbols are visible (with PDAs<br />
we had k = 1). If the last m symbols, m ≤ k, are the right hand side of a rule,<br />
you are entitled <strong>to</strong> replace it by the corresponding left hand side. (This style of<br />
parsing is called shift–reduce parsing. One can show that PDAs can emulate a<br />
shift–reduce parser.) The stack gets cleared by m − 1 symbols, so you might end<br />
up seeing more of the stack after this move. Then you may either decide that once<br />
again there is a right hand side of a rule which you want <strong>to</strong> replace by the left hand<br />
side (reduce), or you want <strong>to</strong> see one more symbol of the string (shift). In general,<br />
like with PDAs, the nondeterminism is unavoidable.<br />
There is an important class of languages, the so–called LR(k)–grammars. These<br />
are languages where the question whether <strong>to</strong> shift or <strong>to</strong> reduce can be based on a<br />
lookahead of k symbols. That is <strong>to</strong> say, the parser might not know directly if it<br />
needs <strong>to</strong> shift or <strong>to</strong> reduce, but it can know for sure if it sees the next k symbols (or<br />
whatever is left of the string). One can show that for a language which there is an<br />
LR(k)–grammar with k > 0 there also is a LR(1)–grammar. So, a lookahead of just<br />
one symbol is enough <strong>to</strong> make an informed guess (at least with respect <strong>to</strong> some<br />
grammar, which however need not generate the constituents we are interested in).