Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
(defvar eou_tree<br />
'((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines<br />
((1))<br />
((punc in ("?" ":" "!"))<br />
((1))<br />
((punc is ".")<br />
;; This is to distinguish abbreviations vs periods<br />
;; These are heuristics<br />
((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)")<br />
((n.whitespace is " ")<br />
((0)) ;; if abbrev single space isn't enough for break<br />
((n.name matches "[A-Z].*")<br />
((1))<br />
((0))))<br />
((n.whitespace is " ") ;; if it doesn't look like an abbreviation<br />
((n.name matches "[A-Z].*") ;; single space and non-cap is no break<br />
((1))<br />
((0)))<br />
((1))))<br />
((0)))))<br />
The token items this is applied to will always (except in the end of file case) include one following token, so look<br />
ahead is possible. The "n." and "p." and "p.p." prefixes allow access to the surrounding token context. The features<br />
name, whitespace and punc allow access to the contents of the token itself. At present there is no way to access<br />
the lexicon form this tree which unfortunately might be useful if certain abbreviations were identified as such there.<br />
Note these are heuristics and written by hand not trained from data, though problems have been fixed as they have<br />
been observed in data. The above rules may make mistakes where abbreviations appear at end of lines, and when<br />
improper spacing and capitalization is used. This is probably worth changing, for modes where more casual text<br />
appears, such as email messages and USENET news messages. A possible improvement could be made by analysing<br />
a text to find out its basic threshold of utterance break (i.e. if no full stop, two spaces, followed by a capitalized word<br />
sequences appear and the text is of a reasonable length then look for other criteria for utterance breaks).<br />
Ultimately what we are trying to do is to chunk the text into utterances that can be synthesized quickly and start to<br />
play them quickly to minimise the time someone has to wait for the first sound when starting synthesis. Thus it would<br />
be better if this chunking were done on prosodic phrases rather than chunks more similar to linguistic sentences.<br />
Prosodic phrases are bounded in size, while sentences are not.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
9.2 Text modes<br />
We do not believe that all texts are of the same type. Often information about the general contents of file will aid<br />
synthesis greatly. For example in Latex files we do not want to here "left brace, backslash e m" before each<br />
emphasized word, nor do we want to necessarily hear formating commands. <strong>Festival</strong> offers a basic method for<br />
specifying customization rules depending on the mode of the text. By type we are following the notion of modes in<br />
Emacs and eventually will allow customization at a similar level.<br />
Modes are specified as the third argument to the function tts. When using the Emacs interface to <strong>Festival</strong> the buffer<br />
mode is automatically passed as the text mode. If the mode is not supported a warning message is printed and the raw<br />
text mode is used.<br />
Our initial text mode implementation allows configuration both in C++ and in Scheme. Obviously in C++ almost<br />
anything can be done but it is not as easy to reconfigure without recompilation. Here we will discuss those modes<br />
which can be fully configured at run time.<br />
A text mode may contain the following