12.07.2015 Views

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.3 Output Probability Specification 61.3 Output Probability SpecificationBefore the problem of parameter estimation can be discussed in more detail, the form of the outputdistributions {b j (o t )} needs to be made explicit. <strong>HTK</strong> is designed primarily for modelling continuousparameters using continuous density multivariate output distributions. It can also handleobservation sequences consisting of discrete symbols in which case, the output distributions arediscrete probabilities. For simplicity, however, the presentation in this chapter will assume thatcontinuous density distributions are being used. <strong>The</strong> minor differences that the use of discreteprobabilities entail are noted in chapter 7 and discussed in more detail in chapter 11.In common with most other continuous density HMM systems, <strong>HTK</strong> represents output distributionsby Gaussian Mixture Densities. In <strong>HTK</strong>, however, a further generalisation is made. <strong>HTK</strong>allows each observation vector at time t to be split into a number of S independent data streamso st . <strong>The</strong> formula for computing b j (o t ) is thenb j (o t ) =S∏s=1[Ms∑m=1c jsm N (o st ; µ jsm , Σ jsm )] γs(1.8)where M s is the number of mixture components in stream s, c jsm is the weight of the m’th componentand N (·; µ, Σ) is a multivariate Gaussian with mean vector µ and covariance matrix Σ, thatis1N (o; µ, Σ) = √(2π)n |Σ| e− 1 2 (o−µ)′ Σ −1 (o−µ)(1.9)where n is the dimensionality of o.<strong>The</strong> exponent γ s is a stream weight 1 . It can be used to give a particular stream more emphasis,however, it can only be set manually. No current <strong>HTK</strong> training tools can estimate values for it.Multiple data streams are used to enable separate modelling of multiple information sources. In<strong>HTK</strong>, the processing of streams is completely general. However, the speech input modules assumethat the source data is split into at most 4 streams. Chapter 5 discusses this in more detail but fornow it is sufficient to remark that the default streams are the basic parameter vector, first (delta)and second (acceleration) difference coefficients and log energy.1.4 Baum-Welch Re-EstimationTo determine the parameters of a HMM it is first necessary to make a rough guess at what theymight be. Once this is done, more accurate (in the maximum likelihood sense) parameters can befound by applying the so-called Baum-Welch re-estimation formulae.M-componentGaussianmixturea ijja ij c j1a ij c j2SingleGaussiansj 1j 2a ij c jM...j MFig. 1.5Representing a MixtureChapter 8 gives the formulae used in <strong>HTK</strong> in full detail. Here the basis of the formulae willbe presented in a very informal way. Firstly, it should be noted that the inclusion of multipledata streams does not alter matters significantly since each stream is considered to be statistically1 often referred to as a codebook exponent.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!