12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The segments of an utterance may be saved in a file using the function utt.save.segs which saves the segments<br />

of the named utterance in xlabel format. Any other stream may also be saved using the more general<br />

utt.save.relation which takes the additional argument of a relation name. The names of each item and the<br />

end feature of each item are saved in the named file, again in Xlabel format, other features are saved in extra fields.<br />

For more elaborated saving methods you can easily write a Scheme function to save data in an utterance in whatever<br />

format is required. See the file `lib/mbrola.scm' for an example.<br />

A simple function to allow the displaying of an utterance in Entropic's Xwaves tool is provided by the function<br />

display. It simply saves the waveform and the segments and sends appropriate commands to (the already running)<br />

Xwaves and xlabel programs.<br />

A function to synthesize an externally specified utterance is provided for by utt.resynth which takes two<br />

filename arguments, an xlabel segment file and an F0 file. This function loads, synthesizes and plays an utterance<br />

synthesized from these files. The loading is provided by the underlying function utt.load.segf0.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

15. Text analysis<br />

15.1 Tokenizing Splitting text into tokens<br />

15.2 Token to word rules<br />

15.3 Homograph disambiguation "Wed 5 may wind US Sen up"<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

15.1 Tokenizing<br />

A crucial stage in text processing is the initial tokenization of text. A token in <strong>Festival</strong> is an atom separated with<br />

whitespace from a text file (or string). If punctuation for the current language is defined, characters matching that<br />

punctuation are removed from the beginning and end of a token and held as features of the token. The default list of<br />

characters to be treated as white space is defined as<br />

(defvar token.whitespace " \t\n\r")<br />

While the default set of punctuation characters is<br />

(defvar token.punctuation "\"'`.,:;!?(){}[]")<br />

(defvar token.prepunctuation "\"'`({[")<br />

These are declared in `lib/token.scm' but may be changed for different languages, text modes etc.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

15.2 Token to word rules<br />

Tokens are further analysed into lists of words. A word is an atom that can be given a pronunciation by the lexicon<br />

(or letter to sound rules). A token may give rise to a number of words or none at all.<br />

For example the basic tokens

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!