12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 91approaches <strong>is</strong> that they reuse previously hypothesized grammar structures, possibly preventing the algorithmfrom considering better alternatives.1. Avoid duplicate samples: incorporate duplicate samples only once, with appropriately adjusted counts.Th<strong>is</strong> <strong>is</strong> a trivial optimization that can never do harm.2. Try parsing samples first before resorting to the ordinary creation <strong>of</strong> new productions. If a new sample <strong>is</strong>parsed successfully counts on the old productions are updated to reflect the new sample. 8 Th<strong>is</strong> methodsubsumes strategy 1 above. (See Section 6.5.3 for ways to efficiently handle the parsing <strong>of</strong> bracketedsamples, which <strong>is</strong> needed if th<strong>is</strong> method <strong>is</strong> to applied to structured samples.)3. To save initial merging <strong>of</strong> preterminals, reuse ex<strong>is</strong>ting preterminals where possible. Th<strong>is</strong> precludes thecreation <strong>of</strong> grammars with ambiguity at the level <strong>of</strong> lexical productions.4. Try to parse the new sample into string fragments using ex<strong>is</strong>ting rules, and add only a top-levelproduction to link these fragments to the start symbol. Th<strong>is</strong> subsumes both strategy 2 and strategy 3.(Section 6.5.4 describes one approach to parsing ungrammatical samples into fragments that can beused here.)Unless noted otherw<strong>is</strong>e, only strategy 2 was used in obtaining the results reported here.4.4 Related Work4.4.1 Bayesian grammar learning by enumerationWe already mentioned Horning (1969) as an early proponent <strong>of</strong> the Bayesian version <strong>of</strong> grammarinference by enumeration, as the principle <strong>is</strong> general enough to be applied (in theory) to any type <strong>of</strong> probabil<strong>is</strong>ticgrammar. Horning’s focus was actually on probabil<strong>is</strong>tic CFGs, and the formal device used to enumerategrammars, as well as to assign prior probabilities, was a grammar-generating grammar, or grammar grammar.As expected, enumeration <strong>is</strong> not practical beyond the simplest target grammars, but Horning’s work <strong>is</strong>theoretically important and was one <strong>of</strong> the first to point out the use <strong>of</strong> posterior probabilities as a formalization<strong>of</strong> the simplicity vs. data fit trade-<strong>of</strong>f.4.4.2 Merging and chunking based approaches<strong>The</strong> idea <strong>of</strong> combining merging and chunking with a hill-climbing style search procedure to induceCFG structures seems to have been developed independently by several researchers. Below <strong>is</strong> a l<strong>is</strong>t <strong>of</strong> thosewe are aware <strong>of</strong>.8 If the sample <strong>is</strong> ambiguous the counts could be updated for all derivation according to their respective probabilities, or using onlyfor the Viterbi derivation. In either case the likelihood <strong>of</strong> the sample will be underestimated by the Viterbi-based computation <strong>of</strong> theposterior probability. Updating according to the Viterbi derivation should favor the creation <strong>of</strong> unambiguous grammar structures, but nodetailed compar<strong>is</strong>ons have been done on th<strong>is</strong> <strong>is</strong>sue.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!