Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
After prediction the segmental duration is calculated by the simple formula<br />
duration = mean + (zscore * standard deviation)<br />
For some other duration models that affect an inherent duration by some factor this method has been used. If the tree<br />
predicts factors rather than zscores and the duration_ph_info entries are phone, 0.0, inherent duration. The<br />
above formula will generate the desired result. Klatt and Klatt-like rules can be implemented in the this way without<br />
adding a new method.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
20. UniSyn synthesizer<br />
Since 1.3 a new general synthesizer module has been included. This designed to replace the older diphone<br />
synthesizer described in the next chapter. A redesign was made in order to have a generalized waveform synthesizer,<br />
singla processing module that could be used even when the units being concatenated are not diphones. Also at this<br />
stage the full diphone (or other) database pre-processing functions were added to the <strong>Speech</strong> Tool library.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
20.1 UniSyn database format<br />
The Unisyn synthesis modules can use databases in two basic formats, separate and grouped. Separate is when all<br />
files (signal, pitchmark and coefficient files) are accessed individually during synthesis. This is the standard use<br />
during databse development. Group format is when a database is collected together into a single special file<br />
containing all information necessary for waveform synthesis. This format is designed to be used for distribution and<br />
general use of the database.<br />
A database should consist of a set of waveforms, (which may be translated into a set of coefficients if the desired the<br />
signal processing method requires it), a set of pitchmarks and an index. The pitchmarks are necessary as most of our<br />
current signal processing are pitch synchronous.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
20.1.1 Generating pitchmarks<br />
Pitchmarks may be derived from laryngograph files using the our proved program `pitchmark' distributed with<br />
the speech tools. The actual parameters to this program are still a bit of an art form. The first major issue is which<br />
direction the lar files. We have seen both, though it does seem to be CSTR's ones are most often upside down while<br />
others (e.g. OGI's) are the right way up. The -inv argument to `pitchmark' is specifically provided to cater for<br />
this. There other issues in getting the pitchmarks aligned. The basic command for generating pitchmarks is<br />
pitchmark -inv lar/file001.lar -o pm/file001.pm -otype est \<br />
-min 0.005 -max 0.012 -fill -def 0.01 -wave_end<br />
The `-min', `-max' and `-def' (fill values for unvoiced regions), may need to be changed depending on the<br />
speaker pitch range. The above is suitable for a male speaker. The `-fill' option states that unvoiced sections<br />
should be filled with equally spaced pitchmarks.