12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The process involves the following steps<br />

● Pre-processing lexicon into suitable training set<br />

● Defining the set of allowable pairing of letters to phones. (We intend to do this fully automatically in future<br />

versions).<br />

● Constructing the probabilities of each letter/phone pair.<br />

● Aligning letters to an equal set of phones/_epsilons_.<br />

● Extracting the data by letter suitable for training.<br />

● Building CART models for predicting phone from letters (and context).<br />

● Building additional lexical stress assignment model (if necessary).<br />

All except the first two stages of this are fully automatic.<br />

Before building a model its wise to think a little about what you want it to do. Ideally the model is an auxiluary to the<br />

lexicon so only words not found in the lexicon will require use of the letter to sound rules. Thus only unusual forms<br />

are likely to require the rules. More precisely the most common words, often having the most non-standard<br />

pronunciations, should probably be explicitly listed always. It is possible to reduce the size of the lexicon (sometimes<br />

drastically) by removing all entries that the training LTS model correctly predicts.<br />

Before starting it is wise to consider removing some entries from the lexicon before training, I typically will remove<br />

words under 4 letters and if part of speech information is available I remove all function words, ideally only training<br />

from nouns verbs and adjectives as these are the most likely forms to be unknown in text. It is useful to have<br />

morphologically inflected and derived forms in the training set as it is often such variant forms that not found in the<br />

lexicon even though their root morpheme is. Note that in many forms of text, proper names are the most common<br />

form of unknown word and even the technique presented here may not adequately cater for that form of unknown<br />

words (especially if they unknown words are non-native names). This is all stating that this may or may not be<br />

appropriate for your task but the rules generated by this learning process have in the examples we've done been much<br />

better than what we could produce by hand writing rules of the form described in the previous section.<br />

First preprocess the lexicon into a file of lexical entries to be used for training, removing functions words and<br />

changing the head words to all lower case (may be language dependent). The entries should be of the form used for<br />

input for <strong>Festival</strong>'s lexicon compilation. Specifical the pronunciations should be simple lists of phones (no<br />

syllabification). Depending on the language, you may wish to remve the stressing--for examples here we have though<br />

later tests suggest that we should keep it in even for English. Thus the training set should look something like<br />

("table" nil (t ei b l))<br />

("suspicious" nil (s @ s p i sh @ s))<br />

It is best to split the data into a training set and a test set if you wish to know how well your training has worked. In<br />

our tests we remove every tenth entry and put it in a test set. Note this will mean our test results are probably better<br />

than if we removed say the last ten in every hundred.<br />

The second stage is to define the set of allowable letter to phone mappings irrespective of context. This can<br />

sometimes be initially done by hand then checked against the training set. Initially constract a file of the form<br />

(require 'lts_build)<br />

(set! allowables<br />

'((a _epsilon_)<br />

(b _epsilon_)<br />

(c _epsilon_)<br />

...<br />

(y _epsilon_)<br />

(z _epsilon_)<br />

(# #)))<br />

All letters that appear in the alphabet should (at least) map to _epsilon_, including any accented characters that<br />

appear in that language. Note the last two hashes. These are used by to denote beginning and end of word and are<br />

automatically added during training, they must appear in the list and should only map to themselves.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!