12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

(defvar eou_tree<br />

'((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines<br />

((1))<br />

((punc in ("?" ":" "!"))<br />

((1))<br />

((punc is ".")<br />

;; This is to distinguish abbreviations vs periods<br />

;; These are heuristics<br />

((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)")<br />

((n.whitespace is " ")<br />

((0)) ;; if abbrev single space isn't enough for break<br />

((n.name matches "[A-Z].*")<br />

((1))<br />

((0))))<br />

((n.whitespace is " ") ;; if it doesn't look like an abbreviation<br />

((n.name matches "[A-Z].*") ;; single space and non-cap is no break<br />

((1))<br />

((0)))<br />

((1))))<br />

((0)))))<br />

The token items this is applied to will always (except in the end of file case) include one following token, so look<br />

ahead is possible. The "n." and "p." and "p.p." prefixes allow access to the surrounding token context. The features<br />

name, whitespace and punc allow access to the contents of the token itself. At present there is no way to access<br />

the lexicon form this tree which unfortunately might be useful if certain abbreviations were identified as such there.<br />

Note these are heuristics and written by hand not trained from data, though problems have been fixed as they have<br />

been observed in data. The above rules may make mistakes where abbreviations appear at end of lines, and when<br />

improper spacing and capitalization is used. This is probably worth changing, for modes where more casual text<br />

appears, such as email messages and USENET news messages. A possible improvement could be made by analysing<br />

a text to find out its basic threshold of utterance break (i.e. if no full stop, two spaces, followed by a capitalized word<br />

sequences appear and the text is of a reasonable length then look for other criteria for utterance breaks).<br />

Ultimately what we are trying to do is to chunk the text into utterances that can be synthesized quickly and start to<br />

play them quickly to minimise the time someone has to wait for the first sound when starting synthesis. Thus it would<br />

be better if this chunking were done on prosodic phrases rather than chunks more similar to linguistic sentences.<br />

Prosodic phrases are bounded in size, while sentences are not.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

9.2 Text modes<br />

We do not believe that all texts are of the same type. Often information about the general contents of file will aid<br />

synthesis greatly. For example in Latex files we do not want to here "left brace, backslash e m" before each<br />

emphasized word, nor do we want to necessarily hear formating commands. <strong>Festival</strong> offers a basic method for<br />

specifying customization rules depending on the mode of the text. By type we are following the notion of modes in<br />

Emacs and eventually will allow customization at a similar level.<br />

Modes are specified as the third argument to the function tts. When using the Emacs interface to <strong>Festival</strong> the buffer<br />

mode is automatically passed as the text mode. If the mode is not supported a warning message is printed and the raw<br />

text mode is used.<br />

Our initial text mode implementation allows configuration both in C++ and in Scheme. Obviously in C++ almost<br />

anything can be done but it is not as easy to reconfigure without recompilation. Here we will discuss those modes<br />

which can be fully configured at run time.<br />

A text mode may contain the following

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!