12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 3. HIDDEN MARKOV MODELS 48complexity and data fit. Second, since the finite-state models they investigate act as encoder/decoders <strong>of</strong>text they are determin<strong>is</strong>tic, i.e., the current state and the next input symbol determine a unique next state (itfollows that each string has a unique derivation). Th<strong>is</strong> constrains the model space and allows states to beidentified with string suffixes, which <strong>is</strong> the bas<strong>is</strong> <strong>of</strong> all their algorithms. Finally, the models have no end statessince they are supposed to encode continuous text. Th<strong>is</strong> <strong>is</strong> actually a minor difference since we can viewthe end-<strong>of</strong>-sentence as a special symbol, so that the final state <strong>is</strong> simply one that <strong>is</strong> dedicated to emitting thatspecial symbol.Bell et al. (1990) suggest state splittingas a more efficient induction technique for adaptively findinga finite-state model structure. In th<strong>is</strong> approach, states are successively duplicated and differentiated accordingto their preceding context, whenever such a move prom<strong>is</strong>es to help the prediction <strong>of</strong> the following symbol.Ron et al. (1994) give a reformulation and formal analys<strong>is</strong> <strong>of</strong> th<strong>is</strong> idea in terms <strong>of</strong> an information-theoreticevaluation function.Interestingly, Bell et al. (1990) show that such a state splitting strategy confines the power <strong>of</strong> thefinite-state model to that <strong>of</strong> a finite-context model. In models <strong>of</strong> th<strong>is</strong> type there <strong>is</strong> always a finite bound",such that the last"preceding symbols uniquely determine the d<strong>is</strong>tribution<strong>of</strong> the next symbol. In other words,state-based models derived by th<strong>is</strong> kind <strong>of</strong> splittingare ¢ essentially -gram models with variable (but bounded)context. Th<strong>is</strong> restriction applies equally to the algorithm <strong>of</strong> Ron et al. (1994).By contrast, consider the HMM depicted in Figure 3.2, which <strong>is</strong> used below as a benchmark model.It describes a language in which the context needed for correct prediction <strong>of</strong> the final symbol <strong>is</strong> unbounded.Such a model can be found without difficulty by simple best-first merging. <strong>The</strong> major advantage <strong>of</strong> splittingapproach <strong>is</strong> that it <strong>is</strong> guaranteed to find the appropriate model if enough data <strong>is</strong> presented and if the targetlanguage <strong>is</strong> in fact finite-context.3.5.4 Other probabil<strong>is</strong>tic approachesAnother probabil<strong>is</strong>tic approach to HMM structure induction similar to ours <strong>is</strong> described by Thomason& Granum (1986). <strong>The</strong> basic idea <strong>is</strong> to incrementally build a model structure by incorporating newsamples using an extended form <strong>of</strong> Viterbi alignment. New samples are aligned to the ex<strong>is</strong>ting model so asto maximize their likelihood, while allowing states to be inserted or deleted for alignment purposes. <strong>The</strong>procedure <strong>is</strong> limited to HMMs that have a left-to-right ordering <strong>of</strong> states, however; in particular, no loops areallowed. In a sense th<strong>is</strong> approach can be seen as an approximation to Bayesian HMM merging for th<strong>is</strong> specialclass <strong>of</strong> models. <strong>The</strong> approximation in th<strong>is</strong> case <strong>is</strong> tw<strong>of</strong>old: the likelihood (not the posterior) <strong>is</strong> maximized,and only the likelihood <strong>of</strong> a single sample (rather than the entire data set) <strong>is</strong> considered.Haussler et al. (1992) apply HMMs trained by the Baum-Welch method to the problem <strong>of</strong> proteinprimary structure alignment. <strong>The</strong>ir model structures are mostly <strong>of</strong> a fixed, linear form, but subject to limitedmodification by a heur<strong>is</strong>tic that inserts states (‘stretches’ the model) or deletes states (‘shrinks’ the model)based on the estimated probabilities.Somewhat surpr<strong>is</strong>ingly, the work by Brown et al. (1992) on the construction <strong>of</strong> ¢ class-based -gram

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!