12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 88<strong>The</strong> criterion used for model evaluation <strong>is</strong> the posterior <strong>of</strong> the model structure, as d<strong>is</strong>cussed inSection 2.5.7. It <strong>is</strong> approximated using the same Viterbi method as described for HMMs (Section 3.4.3).Th<strong>is</strong> again has the advantage that the posterior <strong>is</strong> decomposed into a product <strong>of</strong> terms, one for each grammarnonterminal. Changes to the posterior are computed efficiently by recomputing just the terms pertaining tononterminals affected by the merging or chunking operation.4.3.5 Search strategies<strong>The</strong> question <strong>of</strong> how to search efficiently for good sequences <strong>of</strong> merging operators becomes morepressing in the case <strong>of</strong> SCFGs. <strong>The</strong> main reason <strong>is</strong> that the introduction<strong>of</strong> the new operator, chunking, createsa more complex topology in the search space. In addition, the evaluation <strong>of</strong> chunking step <strong>is</strong> not directlycomparable to merging, as chunking does not have a generalizing effect on the grammar (it can only affectthe prior contribution to the posterior).We have experimented with several search strategies for SCFG learning, d<strong>is</strong>cussed below. Clearlymore soph<strong>is</strong>ticated ones are possible, and await further study.Best-first search Th<strong>is</strong> <strong>is</strong> the straightforwardextension <strong>of</strong> our approach to merging with HMMs. All operatortypes and application instances are pooled for the purpose <strong>of</strong> compar<strong>is</strong>on, and at each step the locally bestone <strong>is</strong> chosen. Th<strong>is</strong> <strong>is</strong> combined with the simple linear look-ahead extension described in Section 3.4.5 tohelp overcome local maxima.Th<strong>is</strong> simple approach <strong>of</strong>ten fails because chunking typically has to be followed by several mergingsteps to produce an overall improvement. <strong>The</strong> look-ahead feature <strong>of</strong>ten doesn’t help here as other chunks getin the way between a chunking step and the ‘right’ successive merging choices.Multi-level best-first search One possible solution to the above problem <strong>is</strong> to make the search procedureaware <strong>of</strong> the different nature <strong>of</strong> the two operators, by constraining the way in which they interact. Empirically,the following simple extension <strong>of</strong> the best-first paradigm seems to work generally well for many SCFGs.<strong>The</strong> basic idea <strong>is</strong> that the search operates on two d<strong>is</strong>tinct levels, associated with merging andchunking, respectively. Search at the merging level cons<strong>is</strong>ts <strong>of</strong> a best-first sequence <strong>of</strong> merging steps (withlook-ahead). Search at the second level chooses the locally best chunking step, and then proceeds with asearch at level 1. (Clearly, th<strong>is</strong> approach could be generalized to any number <strong>of</strong> search levels).Notice that in th<strong>is</strong> approach, the chunking steps are not evaluated by trying an exhaustive sequence<strong>of</strong> merges following each possible choice. Th<strong>is</strong> would entail an overhead that <strong>is</strong> quite significant even in smallcases.Beam search In a beam search the locality <strong>of</strong> the search <strong>is</strong> relaxed by considering a pool a relatively goodmodels simultaneously, rather than only a single one as in best-first search. In Section 3.3 we remarked thatbeam-search for HMMs seems to only very rarely give worthwhile improvements over the best-first approach.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!