12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ICHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 95the number caused the resulting grammars to be less general in several cases, and giving significantly lesssamples (10) would produce over-general results. We already d<strong>is</strong>cussed why an evaluation function should,in fact, depend on the actual number <strong>of</strong> samples, not just their relative frequencies, and one can conclude thatCook’s evaluation function <strong>is</strong> tuned to be roughly equivalent to the Bayesian posterior for a sample countaround 50.Palindromes <strong>The</strong> simple palindrome (!(eXŽ(languagebenchmark language in several investigations into CFG learning algorithms. Cook et al. (1976) d<strong>is</strong>cuss itE# has proven to be a relatively difficultbriefly as a language beyond the reach <strong>of</strong> their search operators, noting that it would require a chunkingoperator that does not blindly replace all occurrences <strong>of</strong> a chunk. To learn th<strong>is</strong> grammar, one has to startchunking and , sequences that typically also occur in non-center position, thereby m<strong>is</strong>leading a strictlygreedy learning strategy.Pereira & Schabes (1992) use the same language to show that standard SCFG estimation does verypoorly in finding a grammar <strong>of</strong> the right form when started with random initial parameters. <strong>The</strong> results <strong>of</strong>estimation can be improved considerably by using bracketed samples instead (although the resulting grammarstill gives incorrect parse structure to some strings).Indeed our algorithm fails to find a good grammar for th<strong>is</strong> language using only best-first search, dueto the limitations <strong>of</strong> the chunking operation cited by Cook. 12 However, the result improved when applyingthe more powerful beam search (beam width 3).S --> A A--> B B--> A S A--> B S B--> S S ***A --> aB --> bTh<strong>is</strong> <strong>is</strong> almost a perfect grammar for the language, except for the production marked by ***. Th<strong>is</strong> production<strong>is</strong> redundant (not strictly required for the derivation <strong>of</strong> any <strong>of</strong> the samples) and can be eliminated by a simpleViterbi reestimation step following the search process. (<strong>The</strong> Viterbi step <strong>is</strong> meant to correct counts that havebecome inaccurate due to the optim<strong>is</strong>tic update procedure during merging.)While th<strong>is</strong> result in itself <strong>is</strong> not very significant it does show that the simple merging operatorsbecome considerably more powerful when combined with more soph<strong>is</strong>ticated search techniques, as expected.merging only.Incidentally, the palindrome language becomes very easy to learn from bracketed samples, using12 For concreteness, we used the same training setup as for the easier palindrome language with center symbol (fifth in Table 4.1). <strong>The</strong>samples and their number where identical except for removal <strong>of</strong> the center marker « .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!