Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
for i in wave/*.wav<br />
do<br />
fname=`basename $i .wav`<br />
echo $i<br />
lpc_analysis -reflection -shift 0.01 -order 18 -o lpc16k/$fname.lpc \<br />
-r lpc16k/$fname.res -otype htk -rtype nist $i<br />
done<br />
It is said that the LPC order should be sample rate divided by one thousand plus 2. This may or may not be<br />
appropriate and if you are particularly worried about the database size it is worth experimenting.<br />
The program `lpc_analysis', found in `speech_tools/bin', can be used to generate the lpc coefficients<br />
and residual. Note these should be reflection coefficients so they may be quantised (as they are in group files).<br />
The coefficients and residual files produced by different LPC analysis programs may start at different offsets. For<br />
example the Entropic's ESPS functions generate LPC coefficients that are offset by one frame shift (e.g. 0.01<br />
seconds). Our own `lpc_analysis' routine has no offset. The Diphone_Init parameter list allows these<br />
offsets to be specified. Using the above function to generate the LPC files the description parameters should include<br />
(lpc_frame_offset 0)<br />
(lpc_res_offset 0.0)<br />
While when generating using ESPS routines the description should be<br />
(lpc_frame_offset 1)<br />
(lpc_res_offset 0.01)<br />
The defaults actually follow the ESPS form, that is lpc_frame_offset is 1 and lpc_res_offset is equal to<br />
the frame shift, if they are not explicitly mentioned.<br />
Note the biggest problem we have in implementing the residual excited LPC resynthesizer was getting the right part<br />
of the residual to line up with the right LPC coefficients describing the pitch mark. Making errors in this degrades the<br />
synthesized waveform notably, but not seriously, making it difficult to determine if it is an offset problem or some<br />
other bug.<br />
Although we have started investigating if extracting pitch synchronous LPC parameters rather than fixed shift<br />
parameters gives better performance, we haven't finished this work. `lpc_analysis' supports pitch synchronous<br />
analysis but the raw "ungrouped" access method does not yet. At present the LPC parameters are extracted at a<br />
particular pitch mark by interpolating over the closest LPC parameters. The "group" files hold these interpolated<br />
parameters pitch synchronously.<br />
The American English voice `kd' was created using the speech tools `lpc_analysis' program and its set up<br />
should be looked at if you are going to copy it. The British English voice `rb' was constructed using ESPS<br />
routines.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
21.3 Group files<br />
Databases may be accessed directly but this is usually too inefficient for any purpose except debugging. It is expected<br />
that group files will be built which contain a binary representation of the database. A group file is a compact efficient<br />
representation of the diphone database. Group files are byte order independent, so may be shared between machines<br />
of different byte orders and word sizes. Certain information in a group file may be changed at load time so a database<br />
name, access strategy etc. may be changed from what was set originally in the group file.<br />
A group file contains the basic parameters, the diphone index, the signal (original waveform or LPC residual), LPC<br />
coefficients, and the pitch marks. It is all you need for a run-time synthesizer. Various compression mechanisms are