12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 165Full CNFSparse CFGBottom-up Inside/outside Stochastic RTNs(Baker 1979) (Kupiec 1992a)Left-to-right LRI Probabil<strong>is</strong>tic(Jelinek & Lafferty 1991) EarleyTable 6.4: Tentative typology <strong>of</strong> SCFG algorithms according to prevailing directionality and sparseness <strong>of</strong>the CFG.zero. It appears that these algorithms tend to be more naturally formulated in terms <strong>of</strong> a stochastic process,as opposed to static specifications <strong>of</strong> string probabilities.To illustrate these points, the algorithms d<strong>is</strong>cussed in th<strong>is</strong> section have been arranged in the griddepicted in Table 6.4.6.8 SummaryWe have presented an Earley-based parser for stochastic context-free grammars that <strong>is</strong> appealing forits combination <strong>of</strong> advantages over ex<strong>is</strong>ting methods. Earley’s control structure makes it run with best-knowncomplexity on a number <strong>of</strong> special grammar classes, and no worse than standard bottom-up probabil<strong>is</strong>tic chartparsers on fully parameterized SCFGs.Unlike bottom-up parsers it also computes accurate prefix probabilities incrementally while scanningits input, along with the usual substring (inside) probabilities. <strong>The</strong> chart constructed during parsing supportsboth Viterbi parse extraction and Baum-Welch type rule probability estimation by way <strong>of</strong> a backward passover the parser chart. If the input comes with (partial) bracketing to indicate phrase structure th<strong>is</strong> informationcan be easily incorporated to restrict the allowable parses. A simple extension <strong>of</strong> the Earley chart allowsfinding partial parses <strong>of</strong> ungrammatical input.<strong>The</strong> computation <strong>of</strong> probabilities <strong>is</strong> conceptually simple, and follows directly Earley’s parsingframework, while drawing heavily on the analogy to finite-state language models. It does not require rewritingthe grammar into normal form. Thus, the present algorithm fills a gap in the ex<strong>is</strong>ting array <strong>of</strong> algorithms forSCFGs, efficiently combining the functionalities and advantages <strong>of</strong> several previous approaches.6.9 Appendix: LR item probabilities as conditional forward probabilitiesIn Section 6.7.3 an interpretation <strong>of</strong> LR item probabilities as defined in Wright (1990:Section 2.1)was given in terms <strong>of</strong> the forward probabilities used by the Earley parser. Below we give a pro<strong>of</strong> for thecorrectness <strong>of</strong> th<strong>is</strong> interpretation. Notice that these are the ‘ideal’ LR probabilities that should be attached toitems, if it weren’t for the identification <strong>of</strong> items with close probabilities to keep the LR state l<strong>is</strong>t finite.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!