12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1234. How should the parameters (e.g., rule in{ probabilities) be chosen to maximize the probability over atraining set <strong>of</strong> strings?<strong>The</strong> incremental model merging algorithm for SCFGs (Chapter 4) requires either (1) or (2) forefficient operation. Traditional grammar parameter estimation <strong>is</strong> essentially (4), and <strong>is</strong> typically also usedas a post-processing step to model merging (after the grammar structure has been learned). <strong>The</strong> algorithmdescribed in th<strong>is</strong> chapter can compute solutions to all four <strong>of</strong> these problems in a single framework, with anumber <strong>of</strong> additional advantages over previously presented <strong>is</strong>olated solutions. It was originally developedsolely as a general and efficient tool and accessory to the model merging algorithm. We then realized that italso solves task (3) in an efficient and elegant fashion, greatly expanding its usefulness, as described below.Most probabil<strong>is</strong>tic parsers are based on a generalization <strong>of</strong> bottom-up chart parsing, such as theCYK algorithm. Partial parses are assembled just as in non-probabil<strong>is</strong>tic parsing (modulo possible pruningbased on probabilities), while substring probabilities (also known as ‘inside’ probabilities) can be computedin a straightforward way. Thus, the CYK chart parser underlies the ‘standard’ solutions to problems (1) and(4) (Baker 1979), as well as (2) (Jelinek 1985). While the Jelinek & Lafferty (1991) solution to problem (3)<strong>is</strong> not a direct extension <strong>of</strong> CYK parsing they nevertheless present their algorithm in terms <strong>of</strong> its similaritiesto the computation <strong>of</strong> inside probabilities.In our algorithm, computations for tasks (1) and (3) proceed incrementally, as the parser scans itsinput from left to right; in particular, prefix probabilities are available as soon as the prefix has been seen, andare updated incrementally as it <strong>is</strong> extended. Tasks (2) and (4) require one more (reverse) pass over the parsetable constructed from the input.Incremental, left-to-right computation <strong>of</strong> prefix probabilities <strong>is</strong> particularly important since that<strong>is</strong> a necessary condition for using SCFGs as a replacement for finite-state language models in many applications,such a speech decoding. As pointed out by Jelinek & Lafferty (1991), knowing probabilities+-,;1£££1 @0 0 for arbitrary prefixes 0 £££1 @enables probabil<strong>is</strong>tic prediction <strong>of</strong> possible follow-words 1 @ # 1,1as 1. 0 £££1>@0Í6+-,210 £££1$@C1$@ +-,;1# 10_³ 0 £££1$@0 . <strong>The</strong>se conditional probabilities can then be used as#word transition probabilities in a Viterbi-style decoder or to incrementally compute the cost function for a+-,21$@stack decoder (Bahl et al. 1983).Another application where prefix probabilities play a central role <strong>is</strong> the extraction ¢ <strong>of</strong> -gramprobabilities from SCFGs, a problem that <strong>is</strong> the subject <strong>of</strong> Chapter 7. Here, too, efficient incrementalcomputation saves time since the work for common prefix strings can be shared.<strong>The</strong> key to most <strong>of</strong> the features <strong>of</strong> our algorithm <strong>is</strong> that it <strong>is</strong> based on the top-down parsing methodfor non-probabil<strong>is</strong>tic CFGs developed by Earley (1970). Earley’s algorithm <strong>is</strong> appealing because it runs withbest-known efficiency on a number <strong>of</strong> special classes <strong>of</strong> grammars. In particular, Earley parsing <strong>is</strong> moreefficient than the bottom-up methods in cases where top-down prediction can rule out potential parses <strong>of</strong>substrings. <strong>The</strong> worst-case computational expense <strong>of</strong> the algorithm (either for the complete input, or theincrementally for each new word) <strong>is</strong> as good as that <strong>of</strong> the other known specialized algorithms, but can besubstantially better on well-known grammar classes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!