12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

for i in wave/*.wav<br />

do<br />

fname=`basename $i .wav`<br />

echo $i<br />

lpc_analysis -reflection -shift 0.01 -order 18 -o lpc16k/$fname.lpc \<br />

-r lpc16k/$fname.res -otype htk -rtype nist $i<br />

done<br />

It is said that the LPC order should be sample rate divided by one thousand plus 2. This may or may not be<br />

appropriate and if you are particularly worried about the database size it is worth experimenting.<br />

The program `lpc_analysis', found in `speech_tools/bin', can be used to generate the lpc coefficients<br />

and residual. Note these should be reflection coefficients so they may be quantised (as they are in group files).<br />

The coefficients and residual files produced by different LPC analysis programs may start at different offsets. For<br />

example the Entropic's ESPS functions generate LPC coefficients that are offset by one frame shift (e.g. 0.01<br />

seconds). Our own `lpc_analysis' routine has no offset. The Diphone_Init parameter list allows these<br />

offsets to be specified. Using the above function to generate the LPC files the description parameters should include<br />

(lpc_frame_offset 0)<br />

(lpc_res_offset 0.0)<br />

While when generating using ESPS routines the description should be<br />

(lpc_frame_offset 1)<br />

(lpc_res_offset 0.01)<br />

The defaults actually follow the ESPS form, that is lpc_frame_offset is 1 and lpc_res_offset is equal to<br />

the frame shift, if they are not explicitly mentioned.<br />

Note the biggest problem we have in implementing the residual excited LPC resynthesizer was getting the right part<br />

of the residual to line up with the right LPC coefficients describing the pitch mark. Making errors in this degrades the<br />

synthesized waveform notably, but not seriously, making it difficult to determine if it is an offset problem or some<br />

other bug.<br />

Although we have started investigating if extracting pitch synchronous LPC parameters rather than fixed shift<br />

parameters gives better performance, we haven't finished this work. `lpc_analysis' supports pitch synchronous<br />

analysis but the raw "ungrouped" access method does not yet. At present the LPC parameters are extracted at a<br />

particular pitch mark by interpolating over the closest LPC parameters. The "group" files hold these interpolated<br />

parameters pitch synchronously.<br />

The American English voice `kd' was created using the speech tools `lpc_analysis' program and its set up<br />

should be looked at if you are going to copy it. The British English voice `rb' was constructed using ESPS<br />

routines.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

21.3 Group files<br />

Databases may be accessed directly but this is usually too inefficient for any purpose except debugging. It is expected<br />

that group files will be built which contain a binary representation of the database. A group file is a compact efficient<br />

representation of the diphone database. Group files are byte order independent, so may be shared between machines<br />

of different byte orders and word sizes. Certain information in a group file may be changed at load time so a database<br />

name, access strategy etc. may be changed from what was set originally in the group file.<br />

A group file contains the basic parameters, the diphone index, the signal (original waveform or LPC residual), LPC<br />

coefficients, and the pitch marks. It is all you need for a run-time synthesizer. Various compression mechanisms are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!