12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 3. HIDDEN MARKOV MODELS 34Returning to the example, we now chose to merge states 2 and 6 (4 3). Th<strong>is</strong> step decreases the loglikelihood (from L 0£ 602 to L 0£ 829) but it <strong>is</strong> the smallest decrease that can be achieved by any <strong>of</strong> the potentialmerges.Following that, states 1 and 5 can be merged without (4 penalty 4). <strong>The</strong> resulting HMM <strong>is</strong> theminimal model generating the target language 0# , but what prevents us from merging further, to obtain anHMM for , ? #It turns out that merging the remaining two states reduces the likelihood much more drastically thanthe previous, ‘good’ generalization step, from L 0£ 829 to L 3£ 465 (i.e., three decimal orders <strong>of</strong> magnitude). Apreliminary answer, therefore, <strong>is</strong> to set the threshold small enough to allow only desirable generalizations. Amore sat<strong>is</strong>factory answer <strong>is</strong> provided by the Bayesian methods described below.Note that further data may well justify the generalization to a model for # . Th<strong>is</strong> data-drivencharacter <strong>is</strong> one <strong>of</strong> the central aspects <strong>of</strong> model merging.A domain-specific justification for model merging in the case <strong>of</strong> HMMs applies. It can be seen fromthe example that the structure <strong>of</strong> the generating HMM can always be recovered by an appropriate sequence <strong>of</strong>state merges from the initial model, provided that the available data ‘covers’ all <strong>of</strong> the generating model, i.e.,each em<strong>is</strong>sion and transition <strong>is</strong> exerc<strong>is</strong>ed at least once. Informally, th<strong>is</strong> <strong>is</strong> because the initial model <strong>is</strong> obtainedby ‘unrolling’ the paths used in generating the samples in the target model. <strong>The</strong> iterative merging process,then, <strong>is</strong> an attempt to undo the unrolling, tracing a search through the model space back to the generatingmodel. Of course, the best-first heur<strong>is</strong>tic <strong>is</strong> not guaranteed to find the appropriate sequence <strong>of</strong> merges, or, lesscritically, it may result in a model that <strong>is</strong> only weakly equivalent to the generating model.3.3.3 Priors for Hidden Markov ModelsFrom the previous d<strong>is</strong>cussion it <strong>is</strong> clear that the choice <strong>of</strong> the prior d<strong>is</strong>tribution <strong>is</strong> important sinceit <strong>is</strong> the term in (2.13) that drives generalization. We take the approach that priors should be subject toexperimentation and empirical compar<strong>is</strong>on <strong>of</strong> their ability to lead to useful generalization. <strong>The</strong> choice <strong>of</strong> aprior represents an intermediate level <strong>of</strong> probabil<strong>is</strong>ticmodeling, between the global choice <strong>of</strong> model formal<strong>is</strong>m(HMMs, in our case) and the choice <strong>of</strong> a particular instance from a model class (e.g., a specific HMM structureand parameters). <strong>The</strong> model merging approach ideally replaces the usually poorly constrained choice <strong>of</strong> lowlevelparameters with a more robust choice <strong>of</strong> (few) prior parameters. As long as it doesn’t assign zeroprobability to the correct model, the choice <strong>of</strong> prior <strong>is</strong> eventually overwhelmed by a sufficient amount <strong>of</strong> data.In practice, the ability to find the correct model may be limited by the search strategy used, in our case, themerging process.HMMs are a special kind <strong>of</strong> parameterized graph structure. Unsurpr<strong>is</strong>ingly, many aspects <strong>of</strong> thepriors d<strong>is</strong>cussed in th<strong>is</strong> section can be found in Bayesian approaches to the induction <strong>of</strong> graph-based modelsin other domains (e.g., Bayesian networks (Cooper & Herskovits 1992; Buntine 1991) and dec<strong>is</strong>ion trees(Buntine 1992)).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!