12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

and each state in set ( -ÏH ):6) ¸H : d=0 : 0¸ £9θCHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 126A state produced by prediction<strong>is</strong> called a predicted state. Each prediction corresponds to a potentialexpansion<strong>of</strong> a nonterminal in a left-most derivation.ScanningFor each state:¸ H 6)where <strong>is</strong> a terminal symbol that matches the current input , add the state@.M£…f˜& 11¸6) : ˜H$N(move the dot over the current symbol). A state produced by scanning <strong>is</strong> called a scanned state..EB£Scanningensures that the terminals produced in a derivation match the input string.CompletionFor each complete state@>£that has= to the right <strong>of</strong> the dot, add¸.M£=§˜˜(move the dot over the current nonterminal). A state produced by completion <strong>is</strong> called a completed state. 4H :6) ¸ .}=Œ£Each completion corresponds to the end <strong>of</strong> a nonterminal expansion started by a matching prediction step.One crucial insight into the working <strong>of</strong> Earley’s algorithm <strong>is</strong> that, although both prediction andcompletion feed themselves, there are only a finite number <strong>of</strong> states that can possibly be produced. <strong>The</strong>reforerecursive prediction and completion have to terminate eventually, and the parser can proceed to the next inputvia scanning.withTo complete the description we need only specify the initial and final states. <strong>The</strong> parser starts out<strong>is</strong> the sentence nonterminal (note the empty left-hand side). After processing the last symbol, theparser verifies thatwhere99%£has been produced (among possibly others), where G <strong>is</strong> the length <strong>of</strong> the input 1 . If at any intermediate stage aG : 0state set remains empty (because no states from the previous stage permit scanning) the parse can be abortedbecause an impossible prefix has been detected.States with empty LHS such as those above are useful in other contexts, as will be shown inSection 6.5.4. We will collectively refer to them as dummy states. Dummy states enter the chart only as aresult <strong>of</strong> initialization, as opposed to being derived from grammar productions.4 Note the difference between complete and completed states: Complete states (those with the dot to the right <strong>of</strong> the entire RHS) arealways the result <strong>of</strong> a completion step, but completion also produces states which are not yet complete.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!