12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2. For each candidate"I!computeLet"$ 3. be the merge that maximizes +9,",4@@@4@the merged model",the samples ) in . <strong>The</strong>se states are chained with transitions <strong>of</strong> probability 1, such that a sample 1 £££1 ”•4@@4@@@4,@@CHAPTER 3. HIDDEN MARKOV MODELS 40A. Get some new samples )and incorporate into the current model 4.B. Loop:1. Compute a set <strong>of</strong> candidate merges! from among the states <strong>of</strong> model 4.0. )g0 .0 and its posterior probability+-,",0. )g0 . <strong>The</strong>n let 4#1 :6#">0 .5. Let H :6 HEN 1.+9,# 1. )g0ŒÏ4. If +-, 4. )g0 , break from the loop.C. If the data <strong>is</strong> exhausted, break from the loop and 4 return as the induced model.Incremental merging might in principle produce results worse than the batch version since theevaluation step doesn’t have as much data at its d<strong>is</strong>posal. However, we didn’t find th<strong>is</strong> to be a significantd<strong>is</strong>advantage in practice. One can optimize the number <strong>of</strong> samples incorporated in each step A (the batch size)for overall speed. Th<strong>is</strong> requires balancing the gains due to smaller model size against the constant overhead <strong>of</strong>each execution <strong>of</strong> step B. <strong>The</strong> best value will depend on the data and how much merging <strong>is</strong> actually possibleon each iteration; we found between 1 and 10 samples at a time to be good choices.One has to be careful not to start merging with extremely small models, such as that resulting fromincorporating only a few short samples. Many <strong>of</strong> the priors d<strong>is</strong>cussed earlier contain logarithmic terms thatapproach singularities (log 0) in th<strong>is</strong> case, which can produce poor results, usually by leading to extrememerging. That can easily be prevented by incorporating a larger number <strong>of</strong> samples (say, 10 to 20) beforegoing on to the first merging step.Further modifications to the simple best-first search strategy are d<strong>is</strong>cussed in Section 3.4.5.3.4 Implementation IssuesIn th<strong>is</strong> section we elaborate on the implementation <strong>of</strong> the various steps in the generic HMM mergingalgorithm presented in Section 3.3.5.3.4.1 Efficient sample incorporationIn the simplest case th<strong>is</strong> step creates a dedicated state for each instance <strong>of</strong> a symbol in any <strong>of</strong>Ô 1<strong>is</strong> generated by a 1£££ u Ôstateusequence . 1 can be reached from via a transition <strong>of</strong> probability 1· u u. where <strong>is</strong> the total number <strong>of</strong>Ôsamples. State connects to with probability 1. All states u @)/.corresponding output symbol with probability 1.@ ½ u 1 u, •emit their

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!