12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

)g£?CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 155For robust parsing, we want to identify all nonterminals that can possibly generate any substring <strong>of</strong> the input.Th<strong>is</strong> can be accompl<strong>is</strong>hed by also placing dummy states¸ £ ) for all positions"and nonterminals ) , in the Earley chart prior to the start <strong>of</strong> normal operation. (In practice,":6dummy states need to be added only for those )nonterminalsinput symbol. Th<strong>is</strong> technique <strong>is</strong> d<strong>is</strong>cussed in Section 6.6.3.2.)whose expansion can start with the current<strong>The</strong> immediate effect <strong>of</strong> these extra states <strong>is</strong> that more predictions will be generated, from whichmore completions follow, etc. After fin<strong>is</strong>hing the processing <strong>of</strong> the th state set, the chart will contain statesindicating that )nonterminal 1.Table 6.3(a) illustrates the robust parsing process using the example grammar from Table 6.1(p. 128).¸generates the substring 16å…å…åd:6Probabilities in the extra states are handled as follows. <strong>The</strong> initial dummy states6¸ £ ) areinitialized with a forward probability <strong>of</strong> zero. Th<strong>is</strong> will ensure that the forward probabilities <strong>of</strong> all extra statesremain at zero and don’t interfere with the computation <strong>of</strong> prefix probabilities from the regular Earley states.Inner probabilities on dummy states are initialized to unity just as the9 for start state, and processedin the usual way. <strong>The</strong> inner probabilities for the each substring/nonterminal pair can then be read <strong>of</strong>f <strong>of</strong> thecomplete dummy states.Viterbi probabilities and Viterbi back-pointers can also be processed unchanged. Applying theViterbi-parse procedure from Section 6.5.1 to the complete dummy states yields Viterbi parses for allsubstring/nonterminal pairs.6.5.4.2 Assembling partial parsesInstead <strong>of</strong> consulting the chart for individual substring/nonterminal pairs it may be useful to obtaina l<strong>is</strong>t <strong>of</strong> all complete partial parses <strong>of</strong> the input. A complete partial parse <strong>is</strong> a sequence <strong>of</strong> nonterminals thattogether generate the input. For example, using the grammar in Table 6.1, the input a circle touches abovea square has the complete partial parses ‘NP VT PP’ and ‘Det N VT P NP’, among others. <strong>The</strong> input <strong>is</strong>grammatical if9 exactly <strong>is</strong> among the complete partial parses.First note that there may not ex<strong>is</strong>t a complete partial parse if the input contains unknown symbols.As a preprocessing step, or on-line during parsing, one may have to create new preterminals to account forsuch new input symbols.<strong>The</strong> Earley algorithm can be minimally extended to also generate the l<strong>is</strong>t <strong>of</strong> all partial parses. What<strong>is</strong> needed <strong>is</strong> some device that assembles abutting nonterminals from partial parses left-to-right. Th<strong>is</strong> workcan be carried out as a by-product <strong>of</strong> the normal completion process using the concept <strong>of</strong> a variable state. Avariable state <strong>is</strong> a special kind <strong>of</strong> dummy state in which the RHS can have any number <strong>of</strong> nonterminals to the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!