Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
This pocket-watch was made in 1983.<br />
would give a word relation of<br />
this pocket watch was made in nineteen eighty three<br />
Becuase the relationship between tokens and word in some cases is complex, a user function may be specified for<br />
translating tokens into words. This is designed to deal with things like numbers, email addresses, and other nonobvious<br />
pronunciations of tokens as zero or more words. Currently a builtin function<br />
builtin_english_token_to_words offers much of the necessary functionality for English but a user may<br />
further customize this.<br />
If the user defines a function token_to_words which takes two arguments: a token item and a token name, it will<br />
be called by the Token_English and Token_Any modules. A substantial example is given as<br />
english_token_to_words in `festival/lib/token.scm'.<br />
An example of this function is in `lib/token.scm'. It is quite elaborate and covers most of the common multiword<br />
tokens in English including, numbers, money symbols, Roman numerals, dates, times, plurals of symbols,<br />
number ranges, telephone number and various other symbols.<br />
Let us look at the treatment of one particular phenomena which shows the use of these rules. Consider the expression<br />
"$12 million" which should be rendered as the words "twelve million dollars". Note the word "dollars" which is<br />
introduced by the "$" sign, ends up after the end of the expression. There are two cases we need to deal with as there<br />
are two tokens. The first condition in the cond checks if the current token name is a money symbol, while the<br />
second condition check that following word is a magnitude (million, billion, trillion, zillion etc.) If that is the case the<br />
"$" is removed and the remaining numbers are pronounced, by calling the builtin token to word function. The second<br />
condition deals with the second token. It confirms the previous is a money value (the same regular expression as<br />
before) and then returns the word followed by the word "dollars". If it is neither of these forms then the builtin<br />
function is called.<br />
(define (token_to_words token name)<br />
"(token_to_words TOKEN NAME)<br />
Returns a list of words for NAME from TOKEN."<br />
(cond<br />
((and (string-matches name "\\$[0-9,]+\\(\\.[0-9]+\\)?")<br />
(string-matches (item.feat token "n.name") ".*illion.?"))<br />
(builtin_english_token_to_words token (string-after name "$")))<br />
((and (string-matches (item.feat token "p.name")<br />
"\\$[0-9,]+\\(\\.[0-9]+\\)?")<br />
(string-matches name ".*illion.?"))<br />
(list<br />
name<br />
"dollars"))<br />
(t<br />
(builtin_english_token_to_words token name))))<br />
It is valid to make some conditions return no words, though some care should be taken with that, as punctuation<br />
information may no longer be available to later processing if there are no words related to a token.<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
15.3 Homograph disambiguation<br />
Not all tokens can be rendered as words easily. Their context may affect the way they are to be pronounced. For<br />
example in the utterance