The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

More documents

Recommendations

Info

1.2 Isolated Word Recognition 5where x(0) is constrained to be the model entry state and x(T + 1) is constrained to be the modelexit state.As an alternative to equation 1.5, the likelihood can be approximated by only considering themost likely state sequence, that is{ˆP (O|M) = maxXT∏a x(0)x(1)t=1b x(t) (o t )a x(t)x(t+1)}Although the direct computation of equations 1.5 and 1.6 is not tractable, simple recursiveprocedures exist which allow both quantities to be calculated very efficiently. Before going anyfurther, however, notice that if equation 1.2 is computable then the recognition problem is solved.Given a set of models M i corresponding to words w i , equation 1.2 is solved by using 1.3 andassuming thatP (O|w i ) = P (O|M i ). (1.7)All this, of course, assumes that the parameters {a ij } and {b j (o t )} are known for each modelM i . Herein lies the elegance and power of the HMM framework. Given a set of training examplescorresponding to a particular model, the parameters of that model can be determined automaticallyby a robust and efficient re-estimation procedure. Thus, provided that a sufficient number ofrepresentative examples of each word can be collected then a HMM can be constructed whichimplicitly models all of the many sources of variability inherent in real speech. Fig. 1.4 summarisesthe use of HMMs for isolated word recognition. Firstly, a HMM is trained for each vocabulary wordusing a number of examples of that word. In this case, the vocabulary consists of just three words:“one”, “two” and “three”. Secondly, to recognise some unknown word, the likelihood of each modelgenerating that word is calculated and the most likely model identifies the word.(1.6)(a) TrainingTraining Examplesone two three1.2.3.EstimateModels(b) RecognitionM 1 M 2 M 3Unknown O =P( O|M 1 ) P( O|M 2 ) P( O|M 3 )Choose MaxFig. 1.4Using HMMs for Isolated WordRecognition
1.3 Output Probability Specification 61.3 Output Probability SpecificationBefore the problem of parameter estimation can be discussed in more detail, the form of the outputdistributions {b j (o t )} needs to be made explicit. HTK is designed primarily for modelling continuousparameters using continuous density multivariate output distributions. It can also handleobservation sequences consisting of discrete symbols in which case, the output distributions arediscrete probabilities. For simplicity, however, the presentation in this chapter will assume thatcontinuous density distributions are being used. The minor differences that the use of discreteprobabilities entail are noted in chapter 7 and discussed in more detail in chapter 11.In common with most other continuous density HMM systems, HTK represents output distributionsby Gaussian Mixture Densities. In HTK, however, a further generalisation is made. HTKallows each observation vector at time t to be split into a number of S independent data streamso st . The formula for computing b j (o t ) is thenb j (o t ) =S∏s=1[Ms∑m=1c jsm N (o st ; µ jsm , Σ jsm )] γs(1.8)where M s is the number of mixture components in stream s, c jsm is the weight of the m’th componentand N (·; µ, Σ) is a multivariate Gaussian with mean vector µ and covariance matrix Σ, thatis1N (o; µ, Σ) = √(2π)n |Σ| e− 1 2 (o−µ)′ Σ −1 (o−µ)(1.9)where n is the dimensionality of o.The exponent γ s is a stream weight 1 . It can be used to give a particular stream more emphasis,however, it can only be set manually. No current HTK training tools can estimate values for it.Multiple data streams are used to enable separate modelling of multiple information sources. InHTK, the processing of streams is completely general. However, the speech input modules assumethat the source data is split into at most 4 streams. Chapter 5 discusses this in more detail but fornow it is sufficient to remark that the default streams are the basic parameter vector, first (delta)and second (acceleration) difference coefficients and log energy.1.4 Baum-Welch Re-EstimationTo determine the parameters of a HMM it is first necessary to make a rough guess at what theymight be. Once this is done, more accurate (in the maximum likelihood sense) parameters can befound by applying the so-called Baum-Welch re-estimation formulae.M-componentGaussianmixturea ijja ij c j1a ij c j2SingleGaussiansj 1j 2a ij c jM...j MFig. 1.5Representing a MixtureChapter 8 gives the formulae used in HTK in full detail. Here the basis of the formulae willbe presented in a very informal way. Firstly, it should be noted that the inclusion of multipledata streams does not alter matters significantly since each stream is considered to be statistically1 often referred to as a codebook exponent.
Page 1 and 2: The HTK BookSteve YoungGunnar Everm
Page 3 and 4: II HTK in Depth 454 The Operating E
Page 5 and 6: 12 Networks, Dictionaries and Langu
Page 8 and 9: 17.20HSLab . . . . . . . . . . . .
Page 10 and 11: Part ITutorial Overview1
Page 12 and 13: 1.1 General Principles of HMMs 31.1
Page 16 and 17: 1.4 Baum-Welch Re-Estimation 7indep
Page 18 and 19: 1.5 Recognition and Viterbi Decodin
Page 20 and 21: 1.6 Continuous Speech Recognition 1
Page 22 and 23: 1.7 Speaker Adaptation 13N-best whi
Page 24 and 25: 2.2 Generic Properties of a HTK Too
Page 26 and 27: 2.3 The Toolkit 17HLEHLSDTATSTransc
Page 28 and 29: 2.4 Whats New In Version 3.2 19full
Page 30 and 31: 2.4 Whats New In Version 3.2 212.4.
Page 32 and 33: 3.1 Data Preparation 233.1 Data Pre
Page 34 and 35: 3.1 Data Preparation 25S0001 ONE VA
Page 36 and 37: 3.1 Data Preparation 273.1.4 Step 4
Page 38 and 39: 3.1 Data Preparation 29TIMITPrompts
Page 40 and 41: 3.2 Creating Monophone HMMs 31 5 2
Page 42 and 43: 3.2 Creating Monophone HMMs 33and i
Page 44 and 45: 3.3 Creating Tied-State Triphones 3
Page 50 and 51: 3.5 Running the Recogniser Live 41W
Page 52 and 53: 3.6 Adapting the HMMs 43would produ
Page 54 and 55: Part IIHTK in Depth45
Page 56 and 57: 4.1 The Command Line 474.1 The Comm
Page 58 and 59: 4.3 Configuration Files 49Configura
Page 60 and 61: 4.6 Strings and Names 514.6 Strings
Page 62: 4.8 Input/Output via Pipes and Netw
Page 65 and 66:
Chapter 5Speech Input/OutputMany to
Page 67 and 68:
5.2 Speech Signal Processing 58then
Page 69 and 70:
5.3 Linear Prediction Analysis 60Ce
Page 71 and 72:
5.5 Vocal Tract Length Normalisatio
Page 73 and 74:
5.7 Perceptual Linear Prediction 64
Page 75 and 76:
5.10 Storage of Parameter Files 66w
Page 77 and 78:
5.11 Waveform File Formats 685.10.2
Page 79 and 80:
5.11 Waveform File Formats 705.11.5
Page 81 and 82:
5.12 Direct Audio Input/Output 72Va
Page 83 and 84:
5.14 Vector Quantisation 74LPC_ELPC
Page 85 and 86:
5.15 Viewing Speech with HList 76in
Page 87 and 88:
5.16 Copying and Coding using HCopy
Page 89 and 90:
5.18 Summary 805.18 SummaryThis sec
Page 91 and 92:
5.18 Summary 82Module Name Default
Page 93 and 94:
6.2 Label File Formats 84Orthogonal
Page 95 and 96:
6.3 Master Label Files 866.3 Master
Page 97 and 98:
6.3 Master Label Files 88search wit
Page 99 and 100:
6.4 Editing Label Files 90These few
Page 101 and 102:
6.4 Editing Label Files 92DC V iy a
Page 103 and 104:
Chapter 7HMM Definition FilesSpeech
Page 105 and 106:
7.2 Basic HMM Definitions 96• opt
Page 107 and 108:
7.2 Basic HMM Definitions 98∼h
Page 109 and 110:
7.3 Macro Definitions 100∼o 4 2
Page 111 and 112:
7.3 Macro Definitions 102Once defin
Page 113 and 114:
7.4 HMM Sets 104∼o 4 ∼s “sta
Page 115 and 116:
7.4 HMM Sets 106HTool -H mf1 -H mf2
Page 117 and 118:
7.6 Discrete Probability HMMs 108
Page 119 and 120:
7.9 Regression Class Trees for Adap
Page 121 and 122:
7.11 The HMM Definition Language 11
Page 123 and 124:
7.11 The HMM Definition Language 11
Page 125 and 126:
Chapter 8HMM Parameter Estimationth
Page 127 and 128:
8.1 Training Strategies 118Labelled
Page 129 and 130:
8.2 Initialisation using HInit 120P
Page 131 and 132:
8.3 Flat Starting with HCompV 122da
Page 133 and 134:
8.4 Isolated Unit Re-Estimation usi
Page 135 and 136:
8.5 Embedded Training using HERest
Page 137 and 138:
8.6 Single-Pass Retraining 1288.6 S
Page 139 and 140:
8.8 Parameter Re-Estimation Formula
Page 141 and 142:
8.8 Parameter Re-Estimation Formula
Page 143 and 144:
Chapter 9HMM AdaptationSpeaker Inde
Page 145 and 146:
9.1 Model Adaptation using MLLR 136
Page 147 and 148:
9.2 Model Adaptation using MAP 138
Page 149 and 150:
9.3 Using HEAdapt 1409.3 Using HEAd
Page 151 and 152:
9.4 MLLR Formulae 142Mthe model set
Page 153 and 154:
Chapter 10HMM System RefinementHHED
Page 155 and 156:
10.3 Parameter Tying and Item Lists
Page 157 and 158:
10.4 Data-Driven Clustering 148When
Page 159 and 160:
10.5 Tree-Based Clustering 150TC 10
Page 161 and 162:
10.6 Mixture Incrementing 152QS "L_
Page 163 and 164:
10.8 Miscellaneous Operations 154
Page 165 and 166:
11.2 Using Discrete Models with Spe
Page 167 and 168:
11.3 Tied Mixture Systems 158Speech
Page 169 and 170:
11.4 Parameter Smoothing 160used, t
Page 171 and 172:
12.1 How Networks are Used 162HBuil
Page 173 and 174:
12.2 Word Networks and Standard Lat
Page 175 and 176:
12.3 Building a Word Network with H
Page 177 and 178:
12.4 Bigram Language Models 168wher
Page 179 and 180:
12.6 Testing a Word Network using H
Page 181 and 182:
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡
Page 183 and 184:
12.8 Word Network Expansion 174(a)
Page 185 and 186:
12.9 Other Kinds of Recognition Sys
Page 187 and 188:
Chapter 13Decoding?silonetwo...zero
Page 189 and 190:
13.2 Decoder Organisation 180When u
Page 191 and 192:
13.3 Recognition using Test Databas
Page 193 and 194:
13.4 Evaluating Recognition Results
Page 195 and 196:
13.5 Generating Forced Alignments 1
Page 197 and 198:
13.7 Recognition using Direct Audio
Page 199 and 200:
13.8 N-Best Lists and Lattices 190V
Page 201 and 202:
Chapter 14Fundamentals of languagem
Page 203 and 204:
14.1 n-gram language models 194prob
Page 205 and 206:
14.2 Statistically-derived Class Ma
Page 207 and 208:
14.3 Robust model estimation 198Dis
Page 209 and 210:
14.5 Overview of n-Gram Constructio
Page 211 and 212:
14.6 Class-Based Language Models 20
Page 213 and 214:
15.1 Database preparation 204you ma
Page 215 and 216:
15.2 Mapping OOV words 20615.2 Mapp
Page 217 and 218:
15.4 Testing the LM perplexity 2080
Page 219 and 220:
15.6 Model interpolation 210may set
Page 221 and 222:
15.7 Class-based models 212together
Page 223 and 224:
15.8 Problem solving 214$ LPlex -u
Page 225 and 226:
Chapter 16Language Modelling Refere
Page 227 and 228:
16.4 Class Map Files 218The first t
Page 229 and 230:
16.5 Gram Files 220Notice that the
Page 231 and 232:
16.7 Word LM file formats 22216.7.1
Page 233 and 234:
16.8 Class LM file formats 22416.8.
Page 235 and 236:
16.9 Language modelling tracing 226
Page 237 and 238:
16.11 Compile-time configuration pa
Page 239 and 240:
Part IVReference Section230
Page 241 and 242:
17.1 Cluster 23217.1 Cluster17.1.1
Page 243 and 244:
17.1 Cluster 23417.1.3 TracingClust
Page 245 and 246:
17.2 HBuild 23617.2.3 TracingHBuild
Page 247 and 248:
17.3 HCompV 238-l s The string s mu
Page 249 and 250:
17.4 HCopy 240-t n Set the line wid
Page 251 and 252:
17.5 HDMan 24217.5 HDMan17.5.1 Func
Page 253 and 254:
17.5 HDMan 24417.5.3 TracingHDMan s
Page 255 and 256:
17.6 HEAdapt 246-m f Set the minimu
Page 257 and 258:
17.7 HERest 24817.7 HERest17.7.1 Fu
Page 259 and 260:
17.7 HERest 250-w f Any mixture wei
Page 261 and 262:
17.8 HHEd 252For example,stateComp
Page 263 and 264:
17.8 HHEd 254FA varscaleComputes an
Page 265 and 266:
17.8 HHEd 256where V s is the dimen
Page 267 and 268:
17.8 HHEd 258mixture(m) all mixture
Page 269 and 270:
17.9 HInit 26017.9 HInit17.9.1 Func
Page 271 and 272:
17.10 HLEd 26217.10 HLEd17.10.1 Fun
Page 273 and 274:
17.10 HLEd 26417.10.3 TracingHLEd s
Page 275 and 276:
17.12 HLMCopy 26617.12 HLMCopy17.12
Page 277 and 278:
17.13 HLRescore 268-p f Set the wor
Page 279 and 280:
17.14 HLStats 27017.14.3 UseHLStats
Page 281 and 282:
17.15 HParse 272Note that C style c
Page 283 and 284:
17.15 HParse 274HParse will then re
Page 285 and 286:
17.16 HQuant 276where vqFile is the
Page 287 and 288:
17.17 HRest 278-v f This sets the m
Page 289 and 290:
17.18 HResults 280WORD: %Corr=63.91
Page 291 and 292:
17.18 HResults 28217.18.3 TracingHR
Page 293 and 294:
17.20 HSLab 28417.20 HSLab17.20.1 F
Page 295 and 296:
17.20 HSLab 286Load Load a speech d
Page 297 and 298:
17.21 HSmooth 28817.21 HSmooth17.21
Page 299 and 300:
17.22 HVite 29017.22 HVite17.22.1 F
Page 301 and 302:
17.22 HVite 292-v f Enable word end
Page 303 and 304:
17.23 LAdapt 29417.23.3 TracingLAda
Page 305 and 306:
17.25 LFoF 29617.25 LFoF17.25.1 Fun
Page 307 and 308:
17.26 LGCopy 298-r s Set the root n
Page 309 and 310:
17.28 LGPrep 30017.28 LGPrep17.28.1
Page 311 and 312:
17.28 LGPrep 302-r s Set the root n
Page 313 and 314:
17.30 LMerge 30417.30 LMerge17.30.1
Page 315 and 316:
17.32 LNorm 30617.32 LNorm17.32.1 F
Page 317 and 318:
17.33 LPlex 30817.33.3 TracingLPlex
Page 319 and 320:
Chapter 18Configuration VariablesTh
Page 321 and 322:
18.1 Configuration Variables used i
Page 323 and 324:
18.2 Configuration Variables used i
Page 325 and 326:
19.1 Generic Errors 316HCopy 1000-1
Page 327 and 328:
19.2 Summary of Errors by Tool and
Page 329 and 330:
Page 331 and 332:
Page 333 and 334:
Page 335 and 336:
Page 337 and 338:
Page 339 and 340:
Page 341 and 342:
Page 343 and 344:
Chapter 20HTK Standard Lattice Form
Page 345 and 346:
20.4 Field Types 336segments = ":"
Page 347 and 348:
20.5 Example SLF file 338J=24 S=19
Page 349 and 350:
Index 340default, 48format, 49types
Page 351 and 352:
Index 342SILMARGIN, 72SILSEQCOUNT,
Page 353 and 354:
Index 344output lattice format, 189
Page 355:
Index 346USEHAMMING, 59USEPOWER, 62
show all

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?