12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

After prediction the segmental duration is calculated by the simple formula<br />

duration = mean + (zscore * standard deviation)<br />

For some other duration models that affect an inherent duration by some factor this method has been used. If the tree<br />

predicts factors rather than zscores and the duration_ph_info entries are phone, 0.0, inherent duration. The<br />

above formula will generate the desired result. Klatt and Klatt-like rules can be implemented in the this way without<br />

adding a new method.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

20. UniSyn synthesizer<br />

Since 1.3 a new general synthesizer module has been included. This designed to replace the older diphone<br />

synthesizer described in the next chapter. A redesign was made in order to have a generalized waveform synthesizer,<br />

singla processing module that could be used even when the units being concatenated are not diphones. Also at this<br />

stage the full diphone (or other) database pre-processing functions were added to the <strong>Speech</strong> Tool library.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

20.1 UniSyn database format<br />

The Unisyn synthesis modules can use databases in two basic formats, separate and grouped. Separate is when all<br />

files (signal, pitchmark and coefficient files) are accessed individually during synthesis. This is the standard use<br />

during databse development. Group format is when a database is collected together into a single special file<br />

containing all information necessary for waveform synthesis. This format is designed to be used for distribution and<br />

general use of the database.<br />

A database should consist of a set of waveforms, (which may be translated into a set of coefficients if the desired the<br />

signal processing method requires it), a set of pitchmarks and an index. The pitchmarks are necessary as most of our<br />

current signal processing are pitch synchronous.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

20.1.1 Generating pitchmarks<br />

Pitchmarks may be derived from laryngograph files using the our proved program `pitchmark' distributed with<br />

the speech tools. The actual parameters to this program are still a bit of an art form. The first major issue is which<br />

direction the lar files. We have seen both, though it does seem to be CSTR's ones are most often upside down while<br />

others (e.g. OGI's) are the right way up. The -inv argument to `pitchmark' is specifically provided to cater for<br />

this. There other issues in getting the pitchmarks aligned. The basic command for generating pitchmarks is<br />

pitchmark -inv lar/file001.lar -o pm/file001.pm -otype est \<br />

-min 0.005 -max 0.012 -fill -def 0.01 -wave_end<br />

The `-min', `-max' and `-def' (fill values for unvoiced regions), may need to be changed depending on the<br />

speaker pitch range. The above is suitable for a male speaker. The `-fill' option states that unvoiced sections<br />

should be filled with equally spaced pitchmarks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!