12.07.2015 Views

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1 Data Preparation 26which means that the word WORD is pronounced as the sequence of phones p1 p2 p3 .... <strong>The</strong>string in square brackets specifies the string to output when that word is recognised. If it is omittedthen the word itself is output. If it is included but empty, then nothing is output.To see what the dictionary is like, here are a few entries.Aah spAax spAey spCALLk ao l spDIALd ay ax l spEIGHTey t spPHONEf ow n spSENT-END [] silSENT-START [] silSEVENs eh v n spTOt ax spTOt uw spZEROz ia r ow spNotice that function words such as A and TO have multiple pronunciations. <strong>The</strong> entries for SENT-STARTand SENT-END have a silence model sil as their pronunciations and null output symbols.3.1.3 Step 3 - Recording the Data<strong>The</strong> training and test data will be recorded using the <strong>HTK</strong> tool HSLab. This is a combinedwaveform recording and labelling tool. In this example HSLab will be used just for recording, aslabels already exist. However, if you do not have pre-existing training sentences (such as those fromthe TIMIT database) you can create them either from pre-existing text (as described above) or bylabelling your training utterances using HSLab. HSLab is invoked by typingHSLab nonameThis will cause a window to appear with a waveform display area in the upper half and a rowof buttons, including a record button in the lower half. When the name of a normal file is givenas argument, HSLab displays its contents. Here, the special file name noname indicates that newdata is to be recorded. HSLab makes no special provision for prompting the user. However, eachtime the record button is pressed, it writes the subsequent recording alternately to a file callednoname_0. and to a file called noname_1.. Thus, it is simple to write a shell script which for eachsuccessive line of a prompt file, outputs the prompt, waits for either noname_0. or noname_1. toappear, and then renames the file to the name prepending the prompt (see Fig. 3.4).While the prompts for training sentences already were provided for above, the prompts for testsentences need to be generated before recording them. <strong>The</strong> tool HSGen can be used to do this byrandomly traversing a word network and outputting each word encountered. For example, typingHSGen -l -n 200 wdnet dict > testpromptswould generate 200 numbered test utterances, the first few of which would look something like:1. PHONE YOUNG2. DIAL OH SIX SEVEN SEVEN OH ZERO3. DIAL SEVEN NINE OH OH EIGHT SEVEN NINE NINE4. DIAL SIX NINE SIX TWO NINE FOUR ZERO NINE EIGHT5. CALL JULIAN ODELL... etc<strong>The</strong>se can be piped to construct the prompt file testprompts for the required test data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!