Sphinx Guideline version 1

More documents

Recommendations

Info

3.4 Decoding With <strong>Sphinx</strong> As mentioned above, you need the following files for decoding: Trained models Dictionary Filler dictionary Language model/ FSG Test data To get a list of arguments in <strong>Sphinx</strong>3 decode, just type sphinx3_decode in the command line. You will get a list of arguments and their usage. In general, the configuration file needs, at the very least, to contain the following arguments: -hmm followed by the acoustic model directory -dict followed by the pronunciation dictionary file -fdict followed by filler dictionary -lm followed by the language model, OR -fsg followed by the grammar file -ctl followed by the control file (fileids) -mode fsg (for FSG, for n-gram let it be the default mode, so need to specify this for n-gram) However, in this case, there are several additional arguments that are necessary: -adcin yes (tells the recognizer that input is audio, not feature files) -adchdr 44 (tells the recognizer to skip the 44-byte RIFF header) -cepext .wav (tells the recognizer input files have the .wav extension) -cepdir wav (tells the recognizer input files are in the 'wav' directory) -hyp followed by the output transcription file In case your training arguments are different from default arguments, then set it manually by providing the following parameters. -hypseg \ (followed by Recognition result file, with word segmentations and scores) -logfn \ (followed by Log file (default stdout/stderr)) -verbose yes \ (Show input filenames) -lowerf 133.33334 \ (Lower edge of filters) -upperf 3500 \ (upper edge of filters) -nfft 256 \ (Size of FFT) -wlen 0.0256 \ (Hamming window length) -nfilt 33 \ (Number of filter banks) -ncep 13 \ (Number of cep coefficients) -samprate 8000 \ (Sampling rate) -dither yes (Add 1/2-bit noise) The parameters like lowerf, upperf, nfft, wlen, nfilt,ncep should be same as used for training for good results. 23
Follow the example below: In this case, dictionary, filler dictionary, fileIDs are kept in the folder doc/. Trained models are kept in the folder model_1/. This folder should contain the following files: mdef mixture_weights means variances transition_matrices. wav_district is the directory where all the wav files are stored. FileIds will consist of names of all the wav files in this folder. Create a directory LogOutFiles in which all the output files will be dumped. 1. sphinx3_decode \ 2. -hmm model_1 \ 3. -lm data.lm.DMP \ 4. -lowerf 133.33334 \ 5. -upperf 3500 \ 6. -nfft 256 \ 7. -wlen 0.0256 \ 8. -nfilt 33 \ 9. -ncep 13 \ 10. -samprate 8000 \ 11. -mode fwdflat \ 12. -dither yes \ 13. -dict doc/marathiAgmark1500.dic \ 14. -fdict doc/marathiAgmark1500.filler \ 15. -ctl doc/marathiAgmark1500_train.fileids \ 16. -adcin yes \ 17. -adchdr 44 \ 18. -cepext .wav \ 19. -cepdir wav_district \ 20. -hyp LogOutfiles/1500spkr_s1000_g16.out.txt \ 21. -hypseg LogOutfiles/1500spkr_s1000_g16.hypseg \ 22. -logfn LogOutfiles/1500spkr_s1000_g16.log.txt \ 24
Page 1 and 2: GUIDELINES FOR USING SPHINX VERSION
Page 3 and 4: 3.6 Word Lattice and n-best list ge
Page 5 and 6: 2.1 Overview of training Chapter 2
Page 7 and 8: Note that the words , and are tre
Page 9 and 10: The scripts at above location are a
Page 11 and 12: 3. If the speech data is sampled at
Page 13 and 14: ${database}_s1000_g16.cd_cont_1000_
Page 15 and 16: 2.6.3 FLAT INITIALIZATION OF CI MOD
Page 17 and 18: • contains all the triphones whic
Page 19 and 20: 2.7.2 Force Aligned training Someti
Page 21 and 22: 3.1 Theory Chapter 3 Sphinx Decodin
Page 23 and 24: sphinx3_lm_convert -i data.lm -o da
Page 25: After you have created the JSGF gra
Page 29 and 30: $database_hypseg.txt will look like
Page 31 and 32: We run the sphinx3_decode by giving
Page 33 and 34: FrameIndex 0 0 Phone Phone SIL SIL
Page 35 and 36: Where: T = Total score A = Acoustic
Page 37 and 38: To get the same format as of the ou

Sphinx Guideline version 1

Create successful ePaper yourself

Delete template?

Save as template?