12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

20.1.2 Generating LPC coefficients<br />

LPC coefficients are generated using the `sig2fv' command. Two stages are required, generating the LPC<br />

coefficients and generating the residual. The prototypical commands for these are<br />

sig2fv wav/file001.wav -o lpc/file001.lpc -otype est -lpc_order 16 \<br />

-coefs "lpc" -pm pm/file001.pm -preemph 0.95 -factor 3 \<br />

-window_type hamming<br />

sigfilter wav/file001.wav -o lpc/file001.res -otype nist \<br />

-lpcfilter lpc/file001.lpc -inv_filter<br />

For some databases you may need to normalize the power. Properly normalizing power is difficult but we provide a<br />

simple function which may do the jobs acceptably. You should do this on the waveform before lpc analysis (and<br />

ensure you also do the residual extraction on the normalized waveform rather than the original.<br />

ch_wave -scaleN 0.5 wav/file001.wav -o file001.Nwav<br />

This normalizes the power by maximizing the signal first then multiplying it by the given factor. If the database<br />

waveforms are clean (i.e. no clicks) this can give reasonable results.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

20.2 Generating a diphone index<br />

The diphone index consists of a short header following by an ascii list of each diphone, the file it comes from<br />

followed by its start middle and end times in seconds. For most databases this files needs to be generated by some<br />

database specific script.<br />

An example header is<br />

EST_File index<br />

DataType ascii<br />

NumEntries 2005<br />

IndexName rab_diphone<br />

EST_Header_End<br />

The most notable part is the number of entries, which you should note can get out of sync with the actual number of<br />

entries if you hand edit entries. I.e. if you add an entry and the system still can't find it check that the number of<br />

entries is right.<br />

The entries themselves may take on one of two forms, full entries or index entries. Full entries consist of a diphone<br />

name, where the phones are separated by "-"; a file name which is used to index into the pitchmark, LPC and<br />

waveform file; and the start, middle (change over point between phones) and end of the phone in the file in seconds<br />

of the diphone. For example<br />

r-uh edx_1001 0.225 0.261 0.320<br />

r-e edx_1002 0.224 0.273 0.326<br />

r-i edx_1003 0.240 0.280 0.321<br />

r-o edx_1004 0.212 0.253 0.320<br />

The second form of entry is an index entry which simply states that reference to that diphone should actually be made<br />

to another. For example

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!