The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

More documents

Recommendations

Info

3.3 Creating Tied-State Triphones 39Triphones(hmm12)State Occs(stats)HHEDEdit Script(tree.hed)Tied−StateTriphones(hmm13)HMM List(tiedlist)HREST (x2)Tied−StateTriphones(hmm15)Fig. 3.14 Step 10Notice that for a triphone system, it is necessary to include questions referring to both the rightand left contexts of a phone. The questions should progress from wide, general classifications (suchas consonant, vowel, nasal, diphthong, etc.) to specific instances of each phone. Ideally, the full setof questions loaded using the QS command would include every possible context which can influencethe acoustic realisation of a phone, and can include any linguistic or phonetic classification whichmay be relevant. There is no harm in creating extra unnecessary questions, because those whichare determined to be irrelevant to the data will be ignored.The second TR command enables intermediate level progress reporting so that each of the followingTB commands can be monitored. Each of these TB commands clusters one specific set ofstates. For example, the first TB command applies to the first emitting state of all context-dependentmodels for the phone aa.Each TB command works as follows. Firstly, each set of states defined by the final argument ispooled to form a single cluster. Each question in the question set loaded by the QS commands isused to split the pool into two sets. The use of two sets rather than one, allows the log likelihood ofthe training data to be increased and the question which maximises this increase is selected for thefirst branch of the tree. The process is then repeated until the increase in log likelihood achievableby any question at any node is less than the threshold specified by the first argument (350.0 in thiscase).Note that the values given in the RO and TB commands affect the degree of tying and thereforethe number of states output in the clustered system. The values should be varied according to theamount of training data available. As a final step to the clustering, any pair of clusters which canbe merged such that the decrease in log likelihood is below the threshold is merged. On completion,the states in each cluster i are tied to form a single shared state with macro name xxx i where xxxis the name given by the second argument of the TB command.The set of triphones used so far only includes those needed to cover the training data. The AUcommand takes as its argument a new list of triphones expanded to include all those needed forrecognition. This list can be generated, for example, by using HDMan on the entire dictionary(not just the training dictionary), converting it to triphones using the command TC and outputtinga list of the distinct triphones to a file using the option -nHDMan -b sp -n fulllist -g global.ded -l flog beep-tri beepThe -b sp option specifies that the sp phone is used as a word boundary, and so is excluded fromtriphones. The effect of the AU command is to use the decision trees to synthesise all of the newpreviously unseen triphones in the new list.
3.4 Recogniser Evaluation 40Once all state-tying has been completed and new models synthesised, some models may shareexactly the same 3 states and transition matrices and are thus identical. The CO command is usedto compact the model set by finding all identical models and tying them together 5 , producing anew list of models called tiedlist.One of the advantages of using decision tree clustering is that it allows previously unseen triphonesto be synthesised. To do this, the trees must be saved and this is done by the ST command.Later if new previously unseen triphones are required, for example in the pronunciation of a newvocabulary item, the existing model set can be reloaded into HHEd, the trees reloaded using theLT command and then a new extended list of triphones created using the AU command.After HHEd has completed, the effect of tying can be studied and the thresholds adjusted ifnecessary. The log file will include summary statistics which give the total number of physical statesremaining and the number of models after compacting.Finally, and for the last time, the models are re-estimated twice using HERest. Fig. 3.14illustrates this last step in the HMM build process. The trained models are then contained in thefile hmm15/hmmdefs.3.4 Recogniser EvaluationThe recogniser is now complete and its performance can be evaluated. The recognition networkand dictionary have already been constructed, and test data has been recorded. Thus, all thatis necessary is to run the recogniser and then evaluate the results using the HTK analysis toolHResults3.4.1 Step 11 - Recognising the Test DataAssuming that test.scp holds a list of the coded test files, then each test file will be recognisedand its transcription output to an MLF called recout.mlf by executing the followingHVite -H hmm15/macros -H hmm15/hmmdefs -S test.scp \-l ’*’ -i recout.mlf -w wdnet \-p 0.0 -s 5.0 dict tiedlistThe options -p and -s set the word insertion penalty and the grammar scale factor, respectively.The word insertion penalty is a fixed value added to each token when it transits from the end ofone word to the start of the next. The grammar scale factor is the amount by which the languagemodel probability is scaled before being added to each token as it transits from the end of one wordto the start of the next. These parameters can have a significant effect on recognition performanceand hence, some tuning on development test data is well worthwhile.The dictionary contains monophone transcriptions whereas the supplied HMM list contains wordinternal triphones. HVite will make the necessary conversions when loading the word networkwdnet. However, if the HMM list contained both monophones and context-dependent phones thenHVite would become confused. The required form of word-internal network expansion can beforced by setting the configuration variable FORCECXTEXP to true and ALLOWXWRDEXP to false (seechapter 12 for details).Assuming that the MLF testref.mlf contains word level transcriptions for each test file 6 , theactual performance can be determined by running HResults as followsHResults -I testref.mlf tiedlist recout.mlfthe result would be a print-out of the form====================== HTK Results Analysis ==============Date: Sun Oct 22 16:14:45 1995Ref : testrefs.mlfRec : recout.mlf------------------------ Overall Results -----------------SENT: %Correct=98.50 [H=197, S=3, N=200]5 Note that if the transition matrices had not been tied, the CO command would be ineffective since all modelswould be different by virtue of their unique transition matrices.6 The HLEd tool may have to be used to insert silences at the start and end of each transcription or alternativelyHResults can be used to ignore silences (or any other symbols) using the -e option
Page 1 and 2: The HTK BookSteve YoungGunnar Everm
Page 3 and 4: II HTK in Depth 454 The Operating E
Page 5 and 6: 12 Networks, Dictionaries and Langu
Page 8 and 9: 17.20HSLab . . . . . . . . . . . .
Page 10 and 11: Part ITutorial Overview1
Page 12 and 13: 1.1 General Principles of HMMs 31.1
Page 14 and 15: 1.2 Isolated Word Recognition 5wher
Page 16 and 17: 1.4 Baum-Welch Re-Estimation 7indep
Page 18 and 19: 1.5 Recognition and Viterbi Decodin
Page 20 and 21: 1.6 Continuous Speech Recognition 1
Page 22 and 23: 1.7 Speaker Adaptation 13N-best whi
Page 24 and 25: 2.2 Generic Properties of a HTK Too
Page 26 and 27: 2.3 The Toolkit 17HLEHLSDTATSTransc
Page 28 and 29: 2.4 Whats New In Version 3.2 19full
Page 30 and 31: 2.4 Whats New In Version 3.2 212.4.
Page 32 and 33: 3.1 Data Preparation 233.1 Data Pre
Page 34 and 35: 3.1 Data Preparation 25S0001 ONE VA
Page 36 and 37: 3.1 Data Preparation 273.1.4 Step 4
Page 38 and 39: 3.1 Data Preparation 29TIMITPrompts
Page 40 and 41: 3.2 Creating Monophone HMMs 31 5 2
Page 42 and 43: 3.2 Creating Monophone HMMs 33and i
Page 44 and 45: 3.3 Creating Tied-State Triphones 3
Page 46 and 47: 3.3 Creating Tied-State Triphones 3
Page 50 and 51: 3.5 Running the Recogniser Live 41W
Page 52 and 53: 3.6 Adapting the HMMs 43would produ
Page 54 and 55: Part IIHTK in Depth45
Page 56 and 57: 4.1 The Command Line 474.1 The Comm
Page 58 and 59: 4.3 Configuration Files 49Configura
Page 60 and 61: 4.6 Strings and Names 514.6 Strings
Page 62: 4.8 Input/Output via Pipes and Netw
Page 65 and 66: Chapter 5Speech Input/OutputMany to
Page 67 and 68: 5.2 Speech Signal Processing 58then
Page 69 and 70: 5.3 Linear Prediction Analysis 60Ce
Page 71 and 72: 5.5 Vocal Tract Length Normalisatio
Page 73 and 74: 5.7 Perceptual Linear Prediction 64
Page 75 and 76: 5.10 Storage of Parameter Files 66w
Page 77 and 78: 5.11 Waveform File Formats 685.10.2
Page 79 and 80: 5.11 Waveform File Formats 705.11.5
Page 81 and 82: 5.12 Direct Audio Input/Output 72Va
Page 83 and 84: 5.14 Vector Quantisation 74LPC_ELPC
Page 85 and 86: 5.15 Viewing Speech with HList 76in
Page 87 and 88: 5.16 Copying and Coding using HCopy
Page 89 and 90: 5.18 Summary 805.18 SummaryThis sec
Page 91 and 92: 5.18 Summary 82Module Name Default
Page 93 and 94: 6.2 Label File Formats 84Orthogonal
Page 95 and 96: 6.3 Master Label Files 866.3 Master
Page 97 and 98: 6.3 Master Label Files 88search wit
Page 99 and 100:
6.4 Editing Label Files 90These few
Page 101 and 102:
6.4 Editing Label Files 92DC V iy a
Page 103 and 104:
Chapter 7HMM Definition FilesSpeech
Page 105 and 106:
7.2 Basic HMM Definitions 96• opt
Page 107 and 108:
7.2 Basic HMM Definitions 98∼h
Page 109 and 110:
7.3 Macro Definitions 100∼o 4 2
Page 111 and 112:
7.3 Macro Definitions 102Once defin
Page 113 and 114:
7.4 HMM Sets 104∼o 4 ∼s “sta
Page 115 and 116:
7.4 HMM Sets 106HTool -H mf1 -H mf2
Page 117 and 118:
7.6 Discrete Probability HMMs 108
Page 119 and 120:
7.9 Regression Class Trees for Adap
Page 121 and 122:
7.11 The HMM Definition Language 11
Page 123 and 124:
7.11 The HMM Definition Language 11
Page 125 and 126:
Chapter 8HMM Parameter Estimationth
Page 127 and 128:
8.1 Training Strategies 118Labelled
Page 129 and 130:
8.2 Initialisation using HInit 120P
Page 131 and 132:
8.3 Flat Starting with HCompV 122da
Page 133 and 134:
8.4 Isolated Unit Re-Estimation usi
Page 135 and 136:
8.5 Embedded Training using HERest
Page 137 and 138:
8.6 Single-Pass Retraining 1288.6 S
Page 139 and 140:
8.8 Parameter Re-Estimation Formula
Page 141 and 142:
8.8 Parameter Re-Estimation Formula
Page 143 and 144:
Chapter 9HMM AdaptationSpeaker Inde
Page 145 and 146:
9.1 Model Adaptation using MLLR 136
Page 147 and 148:
9.2 Model Adaptation using MAP 138
Page 149 and 150:
9.3 Using HEAdapt 1409.3 Using HEAd
Page 151 and 152:
9.4 MLLR Formulae 142Mthe model set
Page 153 and 154:
Chapter 10HMM System RefinementHHED
Page 155 and 156:
10.3 Parameter Tying and Item Lists
Page 157 and 158:
10.4 Data-Driven Clustering 148When
Page 159 and 160:
10.5 Tree-Based Clustering 150TC 10
Page 161 and 162:
10.6 Mixture Incrementing 152QS "L_
Page 163 and 164:
10.8 Miscellaneous Operations 154
Page 165 and 166:
11.2 Using Discrete Models with Spe
Page 167 and 168:
11.3 Tied Mixture Systems 158Speech
Page 169 and 170:
11.4 Parameter Smoothing 160used, t
Page 171 and 172:
12.1 How Networks are Used 162HBuil
Page 173 and 174:
12.2 Word Networks and Standard Lat
Page 175 and 176:
12.3 Building a Word Network with H
Page 177 and 178:
12.4 Bigram Language Models 168wher
Page 179 and 180:
12.6 Testing a Word Network using H
Page 181 and 182:
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡
Page 183 and 184:
12.8 Word Network Expansion 174(a)
Page 185 and 186:
12.9 Other Kinds of Recognition Sys
Page 187 and 188:
Chapter 13Decoding?silonetwo...zero
Page 189 and 190:
13.2 Decoder Organisation 180When u
Page 191 and 192:
13.3 Recognition using Test Databas
Page 193 and 194:
13.4 Evaluating Recognition Results
Page 195 and 196:
13.5 Generating Forced Alignments 1
Page 197 and 198:
13.7 Recognition using Direct Audio
Page 199 and 200:
13.8 N-Best Lists and Lattices 190V
Page 201 and 202:
Chapter 14Fundamentals of languagem
Page 203 and 204:
14.1 n-gram language models 194prob
Page 205 and 206:
14.2 Statistically-derived Class Ma
Page 207 and 208:
14.3 Robust model estimation 198Dis
Page 209 and 210:
14.5 Overview of n-Gram Constructio
Page 211 and 212:
14.6 Class-Based Language Models 20
Page 213 and 214:
15.1 Database preparation 204you ma
Page 215 and 216:
15.2 Mapping OOV words 20615.2 Mapp
Page 217 and 218:
15.4 Testing the LM perplexity 2080
Page 219 and 220:
15.6 Model interpolation 210may set
Page 221 and 222:
15.7 Class-based models 212together
Page 223 and 224:
15.8 Problem solving 214$ LPlex -u
Page 225 and 226:
Chapter 16Language Modelling Refere
Page 227 and 228:
16.4 Class Map Files 218The first t
Page 229 and 230:
16.5 Gram Files 220Notice that the
Page 231 and 232:
16.7 Word LM file formats 22216.7.1
Page 233 and 234:
16.8 Class LM file formats 22416.8.
Page 235 and 236:
16.9 Language modelling tracing 226
Page 237 and 238:
16.11 Compile-time configuration pa
Page 239 and 240:
Part IVReference Section230
Page 241 and 242:
17.1 Cluster 23217.1 Cluster17.1.1
Page 243 and 244:
17.1 Cluster 23417.1.3 TracingClust
Page 245 and 246:
17.2 HBuild 23617.2.3 TracingHBuild
Page 247 and 248:
17.3 HCompV 238-l s The string s mu
Page 249 and 250:
17.4 HCopy 240-t n Set the line wid
Page 251 and 252:
17.5 HDMan 24217.5 HDMan17.5.1 Func
Page 253 and 254:
17.5 HDMan 24417.5.3 TracingHDMan s
Page 255 and 256:
17.6 HEAdapt 246-m f Set the minimu
Page 257 and 258:
17.7 HERest 24817.7 HERest17.7.1 Fu
Page 259 and 260:
17.7 HERest 250-w f Any mixture wei
Page 261 and 262:
17.8 HHEd 252For example,stateComp
Page 263 and 264:
17.8 HHEd 254FA varscaleComputes an
Page 265 and 266:
17.8 HHEd 256where V s is the dimen
Page 267 and 268:
17.8 HHEd 258mixture(m) all mixture
Page 269 and 270:
17.9 HInit 26017.9 HInit17.9.1 Func
Page 271 and 272:
17.10 HLEd 26217.10 HLEd17.10.1 Fun
Page 273 and 274:
17.10 HLEd 26417.10.3 TracingHLEd s
Page 275 and 276:
17.12 HLMCopy 26617.12 HLMCopy17.12
Page 277 and 278:
17.13 HLRescore 268-p f Set the wor
Page 279 and 280:
17.14 HLStats 27017.14.3 UseHLStats
Page 281 and 282:
17.15 HParse 272Note that C style c
Page 283 and 284:
17.15 HParse 274HParse will then re
Page 285 and 286:
17.16 HQuant 276where vqFile is the
Page 287 and 288:
17.17 HRest 278-v f This sets the m
Page 289 and 290:
17.18 HResults 280WORD: %Corr=63.91
Page 291 and 292:
17.18 HResults 28217.18.3 TracingHR
Page 293 and 294:
17.20 HSLab 28417.20 HSLab17.20.1 F
Page 295 and 296:
17.20 HSLab 286Load Load a speech d
Page 297 and 298:
17.21 HSmooth 28817.21 HSmooth17.21
Page 299 and 300:
17.22 HVite 29017.22 HVite17.22.1 F
Page 301 and 302:
17.22 HVite 292-v f Enable word end
Page 303 and 304:
17.23 LAdapt 29417.23.3 TracingLAda
Page 305 and 306:
17.25 LFoF 29617.25 LFoF17.25.1 Fun
Page 307 and 308:
17.26 LGCopy 298-r s Set the root n
Page 309 and 310:
17.28 LGPrep 30017.28 LGPrep17.28.1
Page 311 and 312:
17.28 LGPrep 302-r s Set the root n
Page 313 and 314:
17.30 LMerge 30417.30 LMerge17.30.1
Page 315 and 316:
17.32 LNorm 30617.32 LNorm17.32.1 F
Page 317 and 318:
17.33 LPlex 30817.33.3 TracingLPlex
Page 319 and 320:
Chapter 18Configuration VariablesTh
Page 321 and 322:
18.1 Configuration Variables used i
Page 323 and 324:
18.2 Configuration Variables used i
Page 325 and 326:
19.1 Generic Errors 316HCopy 1000-1
Page 327 and 328:
19.2 Summary of Errors by Tool and
Page 329 and 330:
Page 331 and 332:
Page 333 and 334:
Page 335 and 336:
Page 337 and 338:
Page 339 and 340:
Page 341 and 342:
Page 343 and 344:
Chapter 20HTK Standard Lattice Form
Page 345 and 346:
20.4 Field Types 336segments = ":"
Page 347 and 348:
20.5 Example SLF file 338J=24 S=19
Page 349 and 350:
Index 340default, 48format, 49types
Page 351 and 352:
Index 342SILMARGIN, 72SILSEQCOUNT,
Page 353 and 354:
Index 344output lattice format, 189
Page 355:
Index 346USEHAMMING, 59USEPOWER, 62
show all

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

Create successful ePaper yourself

Delete template?

Save as template?