12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

26. Building models from databases<br />

Because our research interests tend towards creating statistical models trained from real speech data, <strong>Festival</strong> offers<br />

various support for extracting information from speech databases, in a way suitable for building models.<br />

Models for accent prediction, F0 generation, duration, vowel reduction, homograph disambiguation, phrase break<br />

assignment and unit selection have been built using <strong>Festival</strong> to extract and process various databases.<br />

26.1 Labelling databases Phones, syllables, words etc.<br />

26.2 Extracting features Extraction of model parameters.<br />

26.3 Building models Building stochastic models from features<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

26.1 Labelling databases<br />

In order for <strong>Festival</strong> to use a database it is most useful to build utterance structures for each utterance in the database.<br />

As discussed earlier, utterance structures contain relations of items. Given such a structure for each utterance in a<br />

database we can easily read in the utterance representation and access it, dumping information in a normalised way<br />

allowing for easy building and testing of models.<br />

Of course the level of labelling that exists, or that you are willing to do by hand or using some automatic tool, for a<br />

particular database will vary. For many purposes you will at least need phonetic labelling. Hand labelled data is still<br />

better than auto-labelled data, but that could change. The size and consistency of the data is important too.<br />

For this discussion we will assume labels for: segments, syllables, words, phrases, intonation events, pitch targets.<br />

Some of these can be derived, some need to be labelled. This would not fail with less labelling but of course you<br />

wouldn't be able to extract as much information from the result.<br />

In our databases these labels are in Entropic's Xlabel format, though it is fairly easy to convert any reasonable format.<br />

Segment<br />

These give phoneme labels for files. Note the these labels must be members of the phoneset that you will be<br />

using for this database. Often phone label files may contain extra labels (e.g. beginning and end silence)<br />

which are not really part of the phoneset. You should remove (or re-label) these phones accordingly.<br />

Word<br />

Again these will need to be provided. The end of the word should come at the last phone in the word (or just<br />

after). Pauses/silences should not be part of the word.<br />

Syllable<br />

There is a chance these can be automatically generated from Word and Segment files given a lexicon. Ideally<br />

these should include lexical stress.<br />

IntEvent<br />

These should ideally mark accent/boundary tone type for each syllable, but this almost definitely requires<br />

hand-labelling. Also given that hand-labelling of accent type is harder and not as accurate, it is arguable that<br />

anything other than accented vs. non-accented can be used reliably.<br />

Phrase<br />

This could just mark the last non-silence phone in each utterance, or before any silence phones in the whole<br />

utterance.<br />

Target<br />

This can be automatically derived from an F0 file and the Segment files. A marking of the mean F0 in each<br />

voiced phone seem to give adequate results.<br />

Once these files are created an utterance file can be automatically created from the above data. Note it is pretty easy<br />

to get the streams right but getting the relations between the streams is much harder. Firstly labelling is rarely<br />

accurate and small windows of error must be allowed to ensure things line up properly. The second problem is that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!