12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

21.1 Diphone database format<br />

A diphone database consists of a dictionary file, a set of waveform files, and a set of pitch mark files. These files are<br />

the same format as the previous CSTR (Osprey) synthesizer.<br />

The dictionary file consist of one entry per line. Each entry consists of five fields: a diphone name of the form P1-P2,<br />

a filename (without extension), a floating point start position in the file in milliseconds, a mid position in<br />

milliseconds (change in phone), and an end position in milliseconds. Lines starting with a semi-colon and blank lines<br />

are ignored. The list may be in any order.<br />

For example a partial list of phones may look like.<br />

ch-l r021 412.035 463.009 518.23<br />

jh-l d747 305.841 382.301 446.018<br />

h-l d748 356.814 403.54 437.522<br />

#-@ d404 233.628 297.345 331.327<br />

@-# d001 836.814 938.761 1002.48<br />

Waveform files may be in any form, as long as every file is the same type, headered or unheadered as long as the<br />

format is supported the speech tools wave reading functions. These may be standard linear PCM waveform files in<br />

the case of PSOLA or LPC coefficients and residual when using the residual LPC synthesizer. 21.2 LPC databases<br />

Pitch mark files consist a simple list of positions in milliseconds (plus places after the point) in order, one per line of<br />

each pitch mark in the file. For high quality diphone synthesis these should be derived from laryngograph data.<br />

During unvoiced sections pitch marks should be artificially created at reasonable intervals (e.g. 10 ms). In the current<br />

format there is no way to determine the "real" pitch marks from the "unvoiced" pitch marks.<br />

It is normal to hold a diphone database in a directory with a number of sub-directories namely `dic/' contain the<br />

dictionary file, `wave/' for the waveform files, typically of whole nonsense words (sometimes this directory is<br />

called `vox/' for historical reasons) and `pm/' for the pitch mark files. The filename in the dictionary entry<br />

should be the same for waveform file and the pitch mark file (with different extensions).<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

21.2 LPC databases<br />

The standard method for diphone resynthesis in the released system is residual excited LPC (hunt89). The actual<br />

method of resynthesis isn't important to the database format, but if residual LPC synthesis is to be used then it is<br />

necessary to make the LPC coefficient files and their corresponding residuals.<br />

Previous versions of the system used a "host of hacky little scripts" to this but now that the Edinburgh <strong>Speech</strong> Tools<br />

supports LPC analysis we can provide a walk through for generating these.<br />

We assume that the waveform file of nonsense words are in a directory called `wave/'. The LPC coefficients and<br />

residuals will be, in this example, stored in `lpc16k/' with extensions `.lpc' and `.res' respectively.<br />

Before starting it is worth considering power normalization. We have found this important on all of the databases we<br />

have collected so far. The ch_wave program, part of the speech tools, with the optional -scaleN 0.4 may be<br />

used if a more complex method is not available.<br />

The following shell command generates the files

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!