12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

,CHAPTER 3. HIDDEN MARKOV MODELS 56b0.170.47a0.48c0.440.430.24 0.02 0.83 0.290.240.30 0.26Start0.060.03a0.98c0.530.190.760.04a0.221.00b0.43c0.81c0.78b1.00EndFigure 3.5: Case study I: Redundant BW-derived HMM structure for " .Baum-Welch studies It <strong>is</strong> instructive to inspect some <strong>of</strong> the HMM topologies found by the Baum-Welchestimator. Figure 3.4 shows models <strong>of</strong> 6 states trained on minimal samples, one exhibiting overgeneralization,and one demonstrating both overfitting and overgeneralization.<strong>The</strong> HMM in (a) generates , !"07 !K0 and has redundantly allocated states to generate !" .<strong>The</strong> HMM in (b) generates , ëT 0776,0 1 2 3. Here, precious states have been wasted, for"/6 -T0modeling the repetition <strong>of</strong> ’s, instead <strong>of</strong> generalizing to a loop over a single state and using those states tomodel the d<strong>is</strong>tinction between and .If estimation using the minimal number <strong>of</strong> states (6 in th<strong>is</strong> case) <strong>is</strong> successful, the d<strong>is</strong>cretizedstructure invariably <strong>is</strong> that <strong>of</strong> the target model (Figure 3.2), as expected, although the probabilities will dependon the training sample used. Successful induction using 10 states, on the other hand, leads to models that,by definition, contain redundant states. However, the redundancy <strong>is</strong> not necessarily a simple duplication <strong>of</strong>states found in the target model structure. Instead, rather convoluted structures are found, such as the one inFigure 3.5 (induced from the random 20 samples).Merging studies We also investigated how the merging algorithm behaves for non-optimal values <strong>of</strong> theglobal prior weight..As explained earlier, th<strong>is</strong> value <strong>is</strong> implicit in the number <strong>of</strong> ‘effective’ samples, theparameter that was maintained constant in all experiments, and which seems to be robust over roughly anorder <strong>of</strong> magnitude.We therefore took resulting. the value and adjusted it both upward and downward by an order <strong>of</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!