12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

This pocket-watch was made in 1983.<br />

would give a word relation of<br />

this pocket watch was made in nineteen eighty three<br />

Becuase the relationship between tokens and word in some cases is complex, a user function may be specified for<br />

translating tokens into words. This is designed to deal with things like numbers, email addresses, and other nonobvious<br />

pronunciations of tokens as zero or more words. Currently a builtin function<br />

builtin_english_token_to_words offers much of the necessary functionality for English but a user may<br />

further customize this.<br />

If the user defines a function token_to_words which takes two arguments: a token item and a token name, it will<br />

be called by the Token_English and Token_Any modules. A substantial example is given as<br />

english_token_to_words in `festival/lib/token.scm'.<br />

An example of this function is in `lib/token.scm'. It is quite elaborate and covers most of the common multiword<br />

tokens in English including, numbers, money symbols, Roman numerals, dates, times, plurals of symbols,<br />

number ranges, telephone number and various other symbols.<br />

Let us look at the treatment of one particular phenomena which shows the use of these rules. Consider the expression<br />

"$12 million" which should be rendered as the words "twelve million dollars". Note the word "dollars" which is<br />

introduced by the "$" sign, ends up after the end of the expression. There are two cases we need to deal with as there<br />

are two tokens. The first condition in the cond checks if the current token name is a money symbol, while the<br />

second condition check that following word is a magnitude (million, billion, trillion, zillion etc.) If that is the case the<br />

"$" is removed and the remaining numbers are pronounced, by calling the builtin token to word function. The second<br />

condition deals with the second token. It confirms the previous is a money value (the same regular expression as<br />

before) and then returns the word followed by the word "dollars". If it is neither of these forms then the builtin<br />

function is called.<br />

(define (token_to_words token name)<br />

"(token_to_words TOKEN NAME)<br />

Returns a list of words for NAME from TOKEN."<br />

(cond<br />

((and (string-matches name "\\$[0-9,]+\\(\\.[0-9]+\\)?")<br />

(string-matches (item.feat token "n.name") ".*illion.?"))<br />

(builtin_english_token_to_words token (string-after name "$")))<br />

((and (string-matches (item.feat token "p.name")<br />

"\\$[0-9,]+\\(\\.[0-9]+\\)?")<br />

(string-matches name ".*illion.?"))<br />

(list<br />

name<br />

"dollars"))<br />

(t<br />

(builtin_english_token_to_words token name))))<br />

It is valid to make some conditions return no words, though some care should be taken with that, as punctuation<br />

information may no longer be available to later processing if there are no words related to a token.<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

15.3 Homograph disambiguation<br />

Not all tokens can be rendered as words easily. Their context may affect the way they are to be pronounced. For<br />

example in the utterance

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!