13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

22. Shift–Reduce–Parsing 90<br />

< number > → < digit >|< number >< number ><br />

This grammar has the NTS–property. Any digit is of category < digit >, and a<br />

number can be broken down in<strong>to</strong> two numbers at any point. The standard definition<br />

is as follows.<br />

(220)<br />

< digit > → 0 | 1 | · · · | 9<br />

< number > → < digit >|< digit >< number ><br />

This grammar is not an NTS–grammar, because in the expression<br />

(221) < digt >< digit >< digit ><br />

the occurrence 〈< digit >, ε〉 of the string < digit >< digit > cannot be a<br />

constituent occurrence under any parse. Finally, the language of strings has no<br />

transparent grammar! This is because the string 732 possesses occurrences of the<br />

strings 73 and 32. These occurrences must be occurrences under one and the same<br />

parse if the grammar is transparent. But they overlap. Contradiction.<br />

Now, the strategy above is deterministic. It is easy <strong>to</strong> see that there is a nondeterministic<br />

algorithm implementing this idea that actually defines a machine<br />

parsing exactly the CFLs. It is a machine where you only have a stack, and symbols<br />

scanned are moved on<strong>to</strong> the stack. The <strong>to</strong>p k symbols are visible (with PDAs<br />

we had k = 1). If the last m symbols, m ≤ k, are the right hand side of a rule,<br />

you are entitled <strong>to</strong> replace it by the corresponding left hand side. (This style of<br />

parsing is called shift–reduce parsing. One can show that PDAs can emulate a<br />

shift–reduce parser.) The stack gets cleared by m − 1 symbols, so you might end<br />

up seeing more of the stack after this move. Then you may either decide that once<br />

again there is a right hand side of a rule which you want <strong>to</strong> replace by the left hand<br />

side (reduce), or you want <strong>to</strong> see one more symbol of the string (shift). In general,<br />

like with PDAs, the nondeterminism is unavoidable.<br />

There is an important class of languages, the so–called LR(k)–grammars. These<br />

are languages where the question whether <strong>to</strong> shift or <strong>to</strong> reduce can be based on a<br />

lookahead of k symbols. That is <strong>to</strong> say, the parser might not know directly if it<br />

needs <strong>to</strong> shift or <strong>to</strong> reduce, but it can know for sure if it sees the next k symbols (or<br />

whatever is left of the string). One can show that for a language which there is an<br />

LR(k)–grammar with k > 0 there also is a LR(1)–grammar. So, a lookahead of just<br />

one symbol is enough <strong>to</strong> make an informed guess (at least with respect <strong>to</strong> some<br />

grammar, which however need not generate the constituents we are interested in).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!