Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
21.1 Diphone database format<br />
A diphone database consists of a dictionary file, a set of waveform files, and a set of pitch mark files. These files are<br />
the same format as the previous CSTR (Osprey) synthesizer.<br />
The dictionary file consist of one entry per line. Each entry consists of five fields: a diphone name of the form P1-P2,<br />
a filename (without extension), a floating point start position in the file in milliseconds, a mid position in<br />
milliseconds (change in phone), and an end position in milliseconds. Lines starting with a semi-colon and blank lines<br />
are ignored. The list may be in any order.<br />
For example a partial list of phones may look like.<br />
ch-l r021 412.035 463.009 518.23<br />
jh-l d747 305.841 382.301 446.018<br />
h-l d748 356.814 403.54 437.522<br />
#-@ d404 233.628 297.345 331.327<br />
@-# d001 836.814 938.761 1002.48<br />
Waveform files may be in any form, as long as every file is the same type, headered or unheadered as long as the<br />
format is supported the speech tools wave reading functions. These may be standard linear PCM waveform files in<br />
the case of PSOLA or LPC coefficients and residual when using the residual LPC synthesizer. 21.2 LPC databases<br />
Pitch mark files consist a simple list of positions in milliseconds (plus places after the point) in order, one per line of<br />
each pitch mark in the file. For high quality diphone synthesis these should be derived from laryngograph data.<br />
During unvoiced sections pitch marks should be artificially created at reasonable intervals (e.g. 10 ms). In the current<br />
format there is no way to determine the "real" pitch marks from the "unvoiced" pitch marks.<br />
It is normal to hold a diphone database in a directory with a number of sub-directories namely `dic/' contain the<br />
dictionary file, `wave/' for the waveform files, typically of whole nonsense words (sometimes this directory is<br />
called `vox/' for historical reasons) and `pm/' for the pitch mark files. The filename in the dictionary entry<br />
should be the same for waveform file and the pitch mark file (with different extensions).<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
21.2 LPC databases<br />
The standard method for diphone resynthesis in the released system is residual excited LPC (hunt89). The actual<br />
method of resynthesis isn't important to the database format, but if residual LPC synthesis is to be used then it is<br />
necessary to make the LPC coefficient files and their corresponding residuals.<br />
Previous versions of the system used a "host of hacky little scripts" to this but now that the Edinburgh <strong>Speech</strong> Tools<br />
supports LPC analysis we can provide a walk through for generating these.<br />
We assume that the waveform file of nonsense words are in a directory called `wave/'. The LPC coefficients and<br />
residuals will be, in this example, stored in `lpc16k/' with extensions `.lpc' and `.res' respectively.<br />
Before starting it is worth considering power normalization. We have found this important on all of the databases we<br />
have collected so far. The ch_wave program, part of the speech tools, with the optional -scaleN 0.4 may be<br />
used if a more complex method is not available.<br />
The following shell command generates the files